docs: comprehensive project documentation update

architecture.md — full rewrite to reflect current v2 state:
  - Accurate component map and pipeline stages
  - Session lifecycle (server-side files, cookie signing, rotation)
  - Multi-account DocuSign support flow
  - Audit log record schema
  - Batch job in-memory state caveat documented
  - Security design table (log sanitizer, session signing, PDF checksums)
  - Known limitations table (retry gaps, shard config, CI fixtures)

PRODUCT-SPEC.md — remove phantom migration_service.py and pdf_coords.py
  that were in the original spec but never implemented; document where
  pipeline orchestration actually lives

README.md — add Production deployment section covering:
  - Reverse proxy / HTTPS requirement for OAuth callbacks
  - Required env vars table
  - SESSION_SECRET_KEY rotation procedure
  - Adobe shard configuration (EU2 / NA1 / others via ADOBE_SIGN_BASE_URL)
  - DocuSign sandbox-to-production switch
  - Session store maintenance (stale file cleanup)

field-mapping.md — add Multi-Document Templates section explaining
  documentId assignment, page number behaviour, and the known limitation
  for multi-doc templates where page numbers are not rebased per document

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Paul Huliganga 2026-04-23 09:51:38 -04:00
parent 2b3413670f
commit 447a89923a
4 changed files with 366 additions and 79 deletions

View File

@ -20,13 +20,13 @@ Develop an agent/toolkit that can programmatically extract template data and fie
#### Components #### Components
- **Adobe Sign Client** (`src/adobe_api.py`) — authenticated API calls, template listing/download - **Adobe Sign Client** (`src/adobe_api.py`) — authenticated API calls, template listing/download
- **DocuSign Client** (`src/upload_docusign_template.py`, `src/docusign_auth.py`) — JWT auth, template upsert - **DocuSign Client** (`src/upload_docusign_template.py`, `src/docusign_auth.py`) — OAuth auth, template upsert
- **Normalized Schema Model** (`src/models/normalized_template.py`) — platform-agnostic intermediate representation - **Normalized Schema Model** (`src/models/normalized_template.py`) — platform-agnostic intermediate representation
- **Mapping Service** (`src/services/mapping_service.py`) — field type, recipient role, coordinate translation - **Mapping Service** (`src/services/mapping_service.py`) — field type, recipient role, coordinate translation; produces `NormalizedTemplate`
- **Validation Service** (`src/services/validation_service.py`) — field count comparison, recipient checks, missing role detection - **Validation Service** (`src/services/validation_service.py`) — blocker and warning checks on the normalized schema
- **Migration Service** (`src/services/migration_service.py`) — orchestrates download → normalize → validate → compose → upload - **Compose** (`src/compose_docusign_template.py`) — converts `NormalizedTemplate` → DocuSign `envelopeTemplate` JSON; emits `FieldIssue` objects for partial/dropped features
- **Report Builder** (`src/reports/report_builder.py`) — structured success/warning/error output - **Report Builder** (`src/reports/report_builder.py`) — structured success/warning/error output per template
- **Web API** (`web/`) — FastAPI endpoints for browser-based orchestration - **Web API** (`web/`) — FastAPI endpoints for browser-based orchestration; full pipeline orchestration lives in `web/routers/migrate.py`
- **Frontend** (`web/static/`) — side-by-side template browser, migration UI - **Frontend** (`web/static/`) — side-by-side template browser, migration UI
#### Service Separation #### Service Separation
@ -34,16 +34,22 @@ Develop an agent/toolkit that can programmatically extract template data and fie
src/ src/
models/ models/
normalized_template.py # intermediate schema normalized_template.py # intermediate schema
field_issue.py # structured field-issue model + issue codes
services/ services/
migration_service.py # pipeline orchestration
mapping_service.py # field/role/coord transformations mapping_service.py # field/role/coord transformations
validation_service.py # pre/post migration checks validation_service.py # pre/post migration checks
reports/ reports/
report_builder.py # structured report output report_builder.py # structured report output
utils/ utils/
pdf_coords.py # coordinate normalization helpers retry.py # exponential backoff retry helpers
log_sanitizer.py # secret redaction from logs
``` ```
> Note: pipeline orchestration (download → normalize → validate → compose → upload → report) is
> implemented inline in `web/routers/migrate.py` (`_migrate_one()`) for the web layer and in
> `src/migrate_template.py` for the CLI. There is no shared `migration_service.py` orchestration
> layer — this is a known divergence from the original spec that is acceptable for the current scope.
--- ---
### High-Level Migration Flow ### High-Level Migration Flow

View File

@ -177,6 +177,79 @@ Create one project per customer to keep history and settings separate.
--- ---
## Production deployment
The web UI is designed for local or private-network use during a migration engagement. If you do expose it more broadly, follow these steps:
### Run behind a reverse proxy (HTTPS required for OAuth)
OAuth callbacks from both Adobe Sign and DocuSign require HTTPS. Use nginx, Caddy, or a cloud load balancer to terminate TLS and proxy to uvicorn:
```
# nginx example
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto https;
}
```
Start uvicorn without `--reload` in production:
```bash
uvicorn web.app:app --host 127.0.0.1 --port 8000 --workers 1
```
> Use `--workers 1` — batch job state is in-memory and not safe to share across workers.
### Required environment variables
| Variable | Description |
|----------|-------------|
| `SESSION_SECRET_KEY` | Random secret for signing session cookies. Generate one with `python3 -c "import secrets; print(secrets.token_hex(32))"` |
| `SESSION_STORE_DIR` | Absolute path for server-side session files (default: `.session-store/` in project root) |
| `AUDIT_LOG_FILE` | Absolute path for the JSONL audit log (default: `.audit-log.jsonl` in project root) |
| `ADOBE_REDIRECT_URI` | Must match the callback URL registered in your Adobe Sign app (e.g. `https://migrator.example.com/api/auth/adobe/callback`) |
| `DOCUSIGN_REDIRECT_URI` | Must match the callback URL registered in your DocuSign app (e.g. `https://migrator.example.com/api/auth/docusign/callback`) |
### Rotating SESSION_SECRET_KEY
Changing `SESSION_SECRET_KEY` invalidates all existing browser sessions — every user will be logged out and must reconnect their Adobe Sign and DocuSign accounts. There is no migration path for existing session files. To rotate:
1. Update `SESSION_SECRET_KEY` in `.env`
2. Delete all files in `SESSION_STORE_DIR`
3. Restart the server
### Shard configuration
By default the app targets the Adobe Sign **EU2** shard. To target a different shard, set `ADOBE_SIGN_BASE_URL` in `.env`:
```
# NA1 shard
ADOBE_SIGN_BASE_URL=https://api.na1.adobesign.com/api/rest/v6
# EU2 shard (default)
ADOBE_SIGN_BASE_URL=https://api.eu2.adobesign.com/api/rest/v6
```
Also update `ADOBE_REDIRECT_URI` and the OAuth app registration to match your shard's auth server if it differs.
For DocuSign, switch from sandbox to production by updating:
```
DOCUSIGN_AUTH_SERVER=account.docusign.com
DOCUSIGN_BASE_URL=https://na3.docusign.net/restapi # your account's base URL
```
### Session store maintenance
Session files accumulate in `SESSION_STORE_DIR` — one file per browser session. Delete stale files periodically:
```bash
# Delete session files older than 7 days
find .session-store/ -name "*.json" -mtime +7 -delete
```
---
## Running tests ## Running tests
```bash ```bash

View File

@ -1,81 +1,263 @@
# Architecture & Design Overview # Architecture & Design — Adobe Sign → DocuSign Migrator
## System Components *Last updated: 2026-04-23*
- **Extraction Layer**: Handles authentication, API calls, and raw data retrieval from Adobe Sign. Input: .env credentials. Output: JSON metadata + field data.
- **Mapping/Transform Layer**: Pure logic between raw Adobe template objects and canonical DocuSign template model. Handles all 1:1, many:1, and lossy mappings. Logging of ambiguities.
- **DocuSign Ingest Layer**: Authenticates, creates/updates templates in DocuSign using mapped objects. Handles feedback, errors, and reporting.
- **Validation/QA Layer**: Compares final artifacts, runs coverage and correctness checks, supports dry-run/test modes.
- **Testing/Scenario Folder**: Sample templates and responses (see `/sample-templates/`) and mapping/transform test cases.
## Data Flow
```mermaid
graph TD
A[Adobe Sign API] -->|Extract| B[Raw JSON]
B -->|Transform/Map| C[Canonical Model]
C -->|Ingest| D[DocuSign API]
D -->|Validate| E[QA/Reporting]
E -->|Feedback| B
```
1. Extract Adobe template (metadata, fields, roles, workflows)
2. Pass to transform/mapping functions (per field/role/conditional)
3. Generate canonical model; attempt creation in DocuSign
4. Log result; pull DocuSign result and validate against input
5. Drop all validated or problematic test scenarios in `/sample-templates/` or a new `tests/` folder for regression & future QA
## Key Design Decisions & Logger
- Focus on batch/parallelization via pipelined scripts/modules
- Use local cache of all raw API payloads for traceability
- Mapping module must be testable with static samples (no account needed at first)
- Agent harness structure for project traceability, autonomous improvement
- **Decision Log** (expand as project runs):
- [2026-04-14] Start with static JSON tests and pure transforms before integrating live API. Document all lossy mappings inline in mapping functions & doc.
- [2026-04-14] Capture all feature-mapping challenges (fields, roles) as they appear in real-world test cases and update this doc.
## Extensibility
- Designed for: new field types, more templates, transform plugins
- Support “mapping hints” or forced overrides for ambiguous/complex field cases
--- ---
## v2 Architecture — Web UI (2026-04-17) ## System Overview
The pipeline is extended with a FastAPI web layer that wraps all existing src/ modules. The migrator is a Python toolkit with two interfaces that share the same core pipeline:
```mermaid - **CLI** (`src/`) — shell scripts for one-off or scripted migrations
graph TD - **Web UI** (`web/`) — FastAPI + vanilla JS SPA for browser-based, multi-user migrations
Browser -->|HTTP| FastAPI
FastAPI -->|OAuth| AdobeSign[Adobe Sign API]
FastAPI -->|OAuth| DocuSign[DocuSign API]
FastAPI -->|calls| Compose[compose_docusign_template.py]
FastAPI -->|calls| Upload[upload_docusign_template.py]
Upload -->|upsert| DocuSign
FastAPI -->|reads/writes| History[migration-output/.history.json]
```
**New layers:** Both interfaces execute the same sequence: authenticate → download → normalize → validate → compose → upload → report.
- `web/routers/auth.py` — browser-initiated OAuth for Adobe Sign and DocuSign
- `web/routers/templates.py` — template listing + migration status computation
- `web/routers/migrate.py` — triggers pipeline; records history
- `web/static/` — vanilla HTML/JS SPA (no build step)
**Template issue status:**
`GET /api/templates/status` drives the Templates and Issues & Warnings pages.
Its summary status combines pre-migration validation and DocuSign composition
analysis:
- `blockers`: validation failures that stop migration.
- `warnings`: validation warnings that allow migration but need review.
- `field_issues`: field mapping caveats emitted by composition, such as skipped
field types or unsupported conditional logic.
The list-level "Clean" label should only appear when all three collections are
empty, so summary rows match the template detail and migration result views.
**Idempotent Upload (v2):**
`upload_docusign_template.py` now searches for an existing DocuSign template by exact name match and updates the most recently modified one (PUT). Falls back to create (POST) if no match. `--force-create` flag bypasses upsert.
--- ---
*Update as architecture/requirements change. Generated by Cleo (2026-04-14). Updated 2026-04-17.* ## Component Map
```
Browser / CLI
┌─────────────────────────────────────────────────┐
│ web/app.py (FastAPI) OR src/migrate_*.py │
session management (web only) │
OAuth orchestration (web only) │
batch job queue (in-memory dict, web only) │
└──────────────┬──────────────────────────────────┘
│ calls
┌──────────┴──────────┐
▼ ▼
src/adobe_api.py src/upload_docusign_template.py
(Adobe Sign REST) (DocuSign REST — upsert)
│ ▲
│ raw JSON │ DocuSign JSON
▼ │
src/services/mapping_service.py
└─► src/models/normalized_template.py
│ NormalizedTemplate
src/services/validation_service.py
│ blockers / warnings
src/compose_docusign_template.py
└─► src/models/field_issue.py
│ (template_dict, warnings, field_issues)
src/reports/report_builder.py
└─► MigrationReport written to migration-output/.history.json
```
---
## Pipeline Stages
### 1. Authentication
| Surface | Adobe Sign | DocuSign |
|---------|-----------|---------|
| CLI | OAuth Auth Code via `adobe_auth.py`; tokens stored in `.env` | OAuth Auth Code via `docusign_auth.py`; tokens stored in `.env` |
| Web | OAuth Auth Code via `/api/auth/adobe/callback`; tokens in server-side session file | OAuth Auth Code via `/api/auth/docusign/callback`; tokens in server-side session file |
The web UI never stores OAuth tokens in `.env` — each browser session carries its own tokens in a signed server-side session file under `.session-store/`. Sessions are identified by a cookie (`session_id`) signed with `SESSION_SECRET_KEY`.
### 2. Download (Adobe Sign)
`src/adobe_api.py` fetches from the Adobe Sign REST v6 API. Shard is configured via `ADOBE_SIGN_BASE_URL` (default: `https://api.eu2.adobesign.com/api/rest/v6`).
For each template, three artifacts are written to `downloads/<template-name>__<id>/`:
| File | Content |
|------|---------|
| `metadata.json` | Template metadata (name, status, creator, dates) |
| `form_fields.json` | Full form field list with locations, conditions, validations |
| `documents.json` | Document list metadata |
| `<name>.pdf` | Binary PDF (base64 decoded) |
### 3. Normalize (`mapping_service.py`)
`MappingService.from_folder(path)` reads the three JSON files and produces a `NormalizedTemplate` (Pydantic model). This platform-agnostic intermediate schema decouples Adobe-specific field names from the DocuSign composition step.
Key transformations at this stage:
- Participant sets → typed role list (`SIGN`, `APPROVE`, `CC`)
- Field locations expanded into flat list (multi-location fields produce N entries)
- Conditional action references converted to normalized `ConditionalRule` objects
### 4. Validate (`validation_service.py`)
Runs pre-migration checks and returns `(blockers: list[str], warnings: list[str])`.
| Check | Result on failure |
|-------|-----------------|
| No recipients | Blocker |
| No documents | Blocker |
| No signature fields | Warning |
| Unassigned fields | Warning |
| Unsupported feature detected | Warning |
Blockers halt migration. Warnings are stored in the history and surfaced in the UI but do not stop the pipeline.
### 5. Compose (`compose_docusign_template.py`)
Converts `NormalizedTemplate` → DocuSign `envelopeTemplate` JSON. Returns a 3-tuple:
```python
(template_dict: dict, warnings: list[str], field_issues: list[dict])
```
`field_issues` are structured `FieldIssue` objects (see `src/models/field_issue.py`) emitted when a field migrates successfully but something was silently dropped or approximated. Each issue has a machine-readable `code` (e.g. `CROSS_RECIPIENT_CONDITIONAL`, `HIDE_ACTION`, `FIELD_TYPE_SKIPPED`). See [field-mapping.md](../field-mapping.md) for the full list.
### 6. Upload (`upload_docusign_template.py`)
Upsert pattern:
1. Search DocuSign for an existing template with the same name
2. If found: `PUT /templates/{id}` (update the most recently modified match)
3. If not found: `POST /templates` (create new)
4. `--force-create` flag bypasses the search and always creates
### 7. Report (`report_builder.py`)
A `MigrationReport` is built per template and appended to `migration-output/.history.json`. Each record contains:
- template name, Adobe ID, DocuSign ID
- status (`success`, `dry_run`, `skipped`, `error`)
- blockers, warnings, field_issues
- PDF checksum (SHA-256)
- timestamp
---
## Web Layer
### FastAPI App (`web/app.py`)
- Mounts all routers under `/api/`
- Serves the SPA shell from `web/static/index.html`
- Installs `SanitizingFilter` on the root logger at startup (redacts tokens and secrets from all log output)
- Logs a warning at startup if `SESSION_SECRET_KEY` is the default development value
### Routers
| Router | Prefix | Responsibility |
|--------|--------|---------------|
| `auth.py` | `/api/auth` | Adobe Sign + DocuSign OAuth flows, session status |
| `templates.py` | `/api/templates` | Adobe template listing; migration status per template |
| `migrate.py` | `/api/migrate` | Single and batch migration; history; job polling |
| `verify.py` | `/api/verify` | Send test envelopes; poll status; void |
| `audit.py` | `/api/audit` | Audit log access + CSV export |
| `admin.py` | `/api/admin` | Admin-only operations (admin_emails gating) |
### Session Lifecycle
```
Browser makes first request
→ middleware generates UUID session_id
→ signed cookie set (itsdangerous, SESSION_SECRET_KEY)
→ session file created at .session-store/<session_id>.json
User connects Adobe Sign / DocuSign
→ OAuth tokens written to session file (never to .env)
→ session file updated on every token refresh
User disconnects or session file deleted
→ next request gets a fresh session_id and new file
→ old file can be deleted manually to force re-auth
```
Session files are plain JSON. Delete all files in `.session-store/` to reset all user sessions. Set `SESSION_STORE_DIR` in `.env` to change the location.
### Multi-Account DocuSign Support
When a DocuSign user belongs to multiple accounts, the web UI:
1. Fetches `/oauth/userinfo` after the OAuth callback
2. Sorts available accounts alphabetically
3. Prompts the user to pick one account for the session
4. Stores `docusign_account_id` in the session alongside the tokens
### Batch Job State
Batch migrations are tracked in an in-memory dict (`_batch_jobs`) in `web/routers/migrate.py`. Job state is lost on server restart — any in-flight batch becomes unrecoverable. This is a known limitation appropriate for single-operator deployments. Production deployments requiring durability should persist job state to a database or file store.
### Audit Log
`web/audit.py` writes one JSONL record per migration event to `AUDIT_LOG_FILE` (default: `.audit-log.jsonl`). Each record:
```json
{
"timestamp": "2026-04-23T12:00:00Z",
"session_id": "abc123",
"user_email": "user@example.com",
"action": "migrate",
"template_name": "Sales Agreement",
"adobe_template_id": "3AAA...",
"docusign_template_id": "uuid",
"status": "success",
"field_issues_count": 2,
"pdf_checksum": "sha256:abcdef..."
}
```
The `/api/audit` endpoints expose this log with filtering and CSV export. Sensitive fields (tokens, secrets) are never written — the `SanitizingFilter` on the root logger ensures they are redacted before hitting any output.
---
## Frontend SPA
Single-page app in `web/static/`. No build step — plain HTML + ES modules.
| File | Responsibility |
|------|---------------|
| `index.html` | Shell, left nav, top bar, router outlet |
| `js/router.js` | Hash-based routing (`#/templates`, `#/results`, etc.) |
| `js/state.js` | Global pub/sub state store |
| `js/api.js` | Typed fetch wrappers for all backend endpoints |
| `js/auth.js` | Auth chip UI, OAuth flow, toast notifications |
| `js/templates.js` | Templates view + detail tabs (overview / issues / history) |
| `js/migration.js` | Migration modal, progress polling, results view |
| `js/issues.js` | Issues & Warnings view |
| `js/verification.js` | Verification view (send / poll / void envelopes) |
| `js/history.js` | History & Audit view |
| `js/settings.js` | Settings view |
| `js/project.js` | Per-customer project context (localStorage) |
| `js/utils.js` | `escHtml`, `formatDate`, `renderFieldIssues`, etc. |
CSS uses DocuSign 2024 brand design tokens defined in `css/tokens.css`.
---
## Security Design
| Concern | Mechanism |
|---------|----------|
| Token leakage in logs | `SanitizingFilter` installed on root logger at startup; redacts Bearer tokens, JWTs, long base64 strings, and key=value assignments for known secret fields |
| Session integrity | Sessions signed with `SESSION_SECRET_KEY` via `itsdangerous`; secret must be set in `.env` |
| Secret exposure at startup | Warning logged if `SESSION_SECRET_KEY` is the default value |
| PDF integrity | SHA-256 checksum computed before upload and stored in history |
| Credential storage | OAuth tokens stored in server-side session files, never in browser localStorage or logs |
---
## Utilities
### `src/utils/retry.py`
`retry_with_backoff` and `async_retry_with_backoff` decorators implement exponential backoff (configurable max retries, base delay, max delay). They target HTTP 429 / 5xx transient errors. These decorators are defined and tested but are not yet applied to API call sites — adding `@retry_with_backoff()` to functions in `adobe_api.py` and `upload_docusign_template.py` is the recommended next step for production hardening.
### `src/utils/log_sanitizer.py`
`install_sanitizing_filter()` attaches a `logging.Filter` to the root logger. The filter runs `redact()` on every log record's message and args, replacing Bearer tokens, JWTs, long base64 strings, and key=value secret assignments with `[REDACTED]`.
---
## Known Limitations
| Limitation | Impact | Mitigation |
|-----------|--------|-----------|
| Batch job state is in-memory | Lost on restart | Acceptable for CLI/single-operator; add DB persistence for multi-operator prod |
| Adobe shard configured via full base URL only | Changing shard requires `.env` update | Set `ADOBE_SIGN_BASE_URL` in `.env` |
| Retry decorators not applied to API calls | 429/5xx errors propagate immediately | Apply `@retry_with_backoff()` to `adobe_api.py` + `upload_docusign_template.py` |
| Regression tests require real fixture data | CI cannot run regression tests without downloaded templates | Check in anonymised fixtures or generate synthetic ones |
*Updated 2026-04-23 — reflects v2 web UI, session lifecycle, audit log schema, multi-account support, batch job state, security design.*

View File

@ -80,6 +80,32 @@ Tab types that do not merge (only first location used or handled specially):
`radioGroupTabs` — each location is one radio button within the group `radioGroupTabs` — each location is one radio button within the group
`signerAttachmentTabs` — each location is an independent attachment request `signerAttachmentTabs` — each location is an independent attachment request
## Multi-Document Templates
Adobe Sign library documents can contain multiple documents (PDFs) stacked into one template. DocuSign templates also support multiple documents — each document gets a unique `documentId` starting from 1.
### How it works
The compose pipeline assigns a `documentId` to each document in the order returned by the Adobe Sign `documents.json` list. All form fields reference their page position within the document they belong to (`pageNumber` is 1-based within the document's own page sequence, not the overall template page count).
```
Adobe Sign template with 2 docs:
doc[0]: "Contract.pdf" (3 pages) → documentId: 1
doc[1]: "Exhibit-A.pdf" (2 pages) → documentId: 2
A field on page 2 of Exhibit-A.pdf:
adobe_location.pageNumber = 2 (within the exhibit)
compose emits: documentId=2, pageNumber=2
```
DocuSign uses `(documentId, pageNumber)` together to locate every tab. If only one document exists, `documentId` is always `1`.
### Known limitation
Adobe Sign form fields store `pageNumber` as a sequential page number across the **entire** template (all documents concatenated). If a template has two 3-page documents, fields on document 2 have `pageNumber` 46. The compose pipeline does not currently rebase page numbers per document — it passes Adobe's page numbers through as-is and sets `documentId` based on field assignment.
**Impact**: For single-document templates this is correct. For multi-document templates, verify field placement visually in DocuSign after migration if the template spans more than one PDF.
## Conditional Logic Mapping ## Conditional Logic Mapping
Adobe Sign `conditionalAction` → DocuSign `conditionalParentLabel` + `conditionalParentValue` on the dependent tab. Adobe Sign `conditionalAction` → DocuSign `conditionalParentLabel` + `conditionalParentValue` on the dependent tab.