From 447a89923a69a97ce0eab2e5ba0b7d5844b793f3 Mon Sep 17 00:00:00 2001 From: Paul Huliganga Date: Thu, 23 Apr 2026 09:51:38 -0400 Subject: [PATCH] docs: comprehensive project documentation update MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit architecture.md — full rewrite to reflect current v2 state: - Accurate component map and pipeline stages - Session lifecycle (server-side files, cookie signing, rotation) - Multi-account DocuSign support flow - Audit log record schema - Batch job in-memory state caveat documented - Security design table (log sanitizer, session signing, PDF checksums) - Known limitations table (retry gaps, shard config, CI fixtures) PRODUCT-SPEC.md — remove phantom migration_service.py and pdf_coords.py that were in the original spec but never implemented; document where pipeline orchestration actually lives README.md — add Production deployment section covering: - Reverse proxy / HTTPS requirement for OAuth callbacks - Required env vars table - SESSION_SECRET_KEY rotation procedure - Adobe shard configuration (EU2 / NA1 / others via ADOBE_SIGN_BASE_URL) - DocuSign sandbox-to-production switch - Session store maintenance (stale file cleanup) field-mapping.md — add Multi-Document Templates section explaining documentId assignment, page number behaviour, and the known limitation for multi-doc templates where page numbers are not rebased per document Co-Authored-By: Claude Sonnet 4.6 --- PRODUCT-SPEC.md | 22 +-- README.md | 73 ++++++++++ docs/architecture.md | 324 +++++++++++++++++++++++++++++++++---------- field-mapping.md | 26 ++++ 4 files changed, 366 insertions(+), 79 deletions(-) diff --git a/PRODUCT-SPEC.md b/PRODUCT-SPEC.md index 0bcdf18..8f2fa5d 100644 --- a/PRODUCT-SPEC.md +++ b/PRODUCT-SPEC.md @@ -20,13 +20,13 @@ Develop an agent/toolkit that can programmatically extract template data and fie #### Components - **Adobe Sign Client** (`src/adobe_api.py`) — authenticated API calls, template listing/download -- **DocuSign Client** (`src/upload_docusign_template.py`, `src/docusign_auth.py`) — JWT auth, template upsert +- **DocuSign Client** (`src/upload_docusign_template.py`, `src/docusign_auth.py`) — OAuth auth, template upsert - **Normalized Schema Model** (`src/models/normalized_template.py`) — platform-agnostic intermediate representation -- **Mapping Service** (`src/services/mapping_service.py`) — field type, recipient role, coordinate translation -- **Validation Service** (`src/services/validation_service.py`) — field count comparison, recipient checks, missing role detection -- **Migration Service** (`src/services/migration_service.py`) — orchestrates download → normalize → validate → compose → upload -- **Report Builder** (`src/reports/report_builder.py`) — structured success/warning/error output -- **Web API** (`web/`) — FastAPI endpoints for browser-based orchestration +- **Mapping Service** (`src/services/mapping_service.py`) — field type, recipient role, coordinate translation; produces `NormalizedTemplate` +- **Validation Service** (`src/services/validation_service.py`) — blocker and warning checks on the normalized schema +- **Compose** (`src/compose_docusign_template.py`) — converts `NormalizedTemplate` → DocuSign `envelopeTemplate` JSON; emits `FieldIssue` objects for partial/dropped features +- **Report Builder** (`src/reports/report_builder.py`) — structured success/warning/error output per template +- **Web API** (`web/`) — FastAPI endpoints for browser-based orchestration; full pipeline orchestration lives in `web/routers/migrate.py` - **Frontend** (`web/static/`) — side-by-side template browser, migration UI #### Service Separation @@ -34,16 +34,22 @@ Develop an agent/toolkit that can programmatically extract template data and fie src/ models/ normalized_template.py # intermediate schema + field_issue.py # structured field-issue model + issue codes services/ - migration_service.py # pipeline orchestration mapping_service.py # field/role/coord transformations validation_service.py # pre/post migration checks reports/ report_builder.py # structured report output utils/ - pdf_coords.py # coordinate normalization helpers + retry.py # exponential backoff retry helpers + log_sanitizer.py # secret redaction from logs ``` +> Note: pipeline orchestration (download → normalize → validate → compose → upload → report) is +> implemented inline in `web/routers/migrate.py` (`_migrate_one()`) for the web layer and in +> `src/migrate_template.py` for the CLI. There is no shared `migration_service.py` orchestration +> layer — this is a known divergence from the original spec that is acceptable for the current scope. + --- ### High-Level Migration Flow diff --git a/README.md b/README.md index f373315..c95132a 100644 --- a/README.md +++ b/README.md @@ -177,6 +177,79 @@ Create one project per customer to keep history and settings separate. --- +## Production deployment + +The web UI is designed for local or private-network use during a migration engagement. If you do expose it more broadly, follow these steps: + +### Run behind a reverse proxy (HTTPS required for OAuth) + +OAuth callbacks from both Adobe Sign and DocuSign require HTTPS. Use nginx, Caddy, or a cloud load balancer to terminate TLS and proxy to uvicorn: + +``` +# nginx example +location / { + proxy_pass http://127.0.0.1:8000; + proxy_set_header Host $host; + proxy_set_header X-Forwarded-Proto https; +} +``` + +Start uvicorn without `--reload` in production: +```bash +uvicorn web.app:app --host 127.0.0.1 --port 8000 --workers 1 +``` + +> Use `--workers 1` — batch job state is in-memory and not safe to share across workers. + +### Required environment variables + +| Variable | Description | +|----------|-------------| +| `SESSION_SECRET_KEY` | Random secret for signing session cookies. Generate one with `python3 -c "import secrets; print(secrets.token_hex(32))"` | +| `SESSION_STORE_DIR` | Absolute path for server-side session files (default: `.session-store/` in project root) | +| `AUDIT_LOG_FILE` | Absolute path for the JSONL audit log (default: `.audit-log.jsonl` in project root) | +| `ADOBE_REDIRECT_URI` | Must match the callback URL registered in your Adobe Sign app (e.g. `https://migrator.example.com/api/auth/adobe/callback`) | +| `DOCUSIGN_REDIRECT_URI` | Must match the callback URL registered in your DocuSign app (e.g. `https://migrator.example.com/api/auth/docusign/callback`) | + +### Rotating SESSION_SECRET_KEY + +Changing `SESSION_SECRET_KEY` invalidates all existing browser sessions — every user will be logged out and must reconnect their Adobe Sign and DocuSign accounts. There is no migration path for existing session files. To rotate: + +1. Update `SESSION_SECRET_KEY` in `.env` +2. Delete all files in `SESSION_STORE_DIR` +3. Restart the server + +### Shard configuration + +By default the app targets the Adobe Sign **EU2** shard. To target a different shard, set `ADOBE_SIGN_BASE_URL` in `.env`: + +``` +# NA1 shard +ADOBE_SIGN_BASE_URL=https://api.na1.adobesign.com/api/rest/v6 + +# EU2 shard (default) +ADOBE_SIGN_BASE_URL=https://api.eu2.adobesign.com/api/rest/v6 +``` + +Also update `ADOBE_REDIRECT_URI` and the OAuth app registration to match your shard's auth server if it differs. + +For DocuSign, switch from sandbox to production by updating: +``` +DOCUSIGN_AUTH_SERVER=account.docusign.com +DOCUSIGN_BASE_URL=https://na3.docusign.net/restapi # your account's base URL +``` + +### Session store maintenance + +Session files accumulate in `SESSION_STORE_DIR` — one file per browser session. Delete stale files periodically: + +```bash +# Delete session files older than 7 days +find .session-store/ -name "*.json" -mtime +7 -delete +``` + +--- + ## Running tests ```bash diff --git a/docs/architecture.md b/docs/architecture.md index d665d49..047c3c6 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,81 +1,263 @@ -# Architecture & Design Overview +# Architecture & Design — Adobe Sign → DocuSign Migrator -## System Components -- **Extraction Layer**: Handles authentication, API calls, and raw data retrieval from Adobe Sign. Input: .env credentials. Output: JSON metadata + field data. -- **Mapping/Transform Layer**: Pure logic between raw Adobe template objects and canonical DocuSign template model. Handles all 1:1, many:1, and lossy mappings. Logging of ambiguities. -- **DocuSign Ingest Layer**: Authenticates, creates/updates templates in DocuSign using mapped objects. Handles feedback, errors, and reporting. -- **Validation/QA Layer**: Compares final artifacts, runs coverage and correctness checks, supports dry-run/test modes. -- **Testing/Scenario Folder**: Sample templates and responses (see `/sample-templates/`) and mapping/transform test cases. - -## Data Flow - -```mermaid -graph TD - A[Adobe Sign API] -->|Extract| B[Raw JSON] - B -->|Transform/Map| C[Canonical Model] - C -->|Ingest| D[DocuSign API] - D -->|Validate| E[QA/Reporting] - E -->|Feedback| B -``` - -1. Extract Adobe template (metadata, fields, roles, workflows) -2. Pass to transform/mapping functions (per field/role/conditional) -3. Generate canonical model; attempt creation in DocuSign -4. Log result; pull DocuSign result and validate against input -5. Drop all validated or problematic test scenarios in `/sample-templates/` or a new `tests/` folder for regression & future QA - -## Key Design Decisions & Logger -- Focus on batch/parallelization via pipelined scripts/modules -- Use local cache of all raw API payloads for traceability -- Mapping module must be testable with static samples (no account needed at first) -- Agent harness structure for project traceability, autonomous improvement -- **Decision Log** (expand as project runs): - - [2026-04-14] Start with static JSON tests and pure transforms before integrating live API. Document all lossy mappings inline in mapping functions & doc. - - [2026-04-14] Capture all feature-mapping challenges (fields, roles) as they appear in real-world test cases and update this doc. - -## Extensibility -- Designed for: new field types, more templates, transform plugins -- Support “mapping hints” or forced overrides for ambiguous/complex field cases +*Last updated: 2026-04-23* --- -## v2 Architecture — Web UI (2026-04-17) +## System Overview -The pipeline is extended with a FastAPI web layer that wraps all existing src/ modules. +The migrator is a Python toolkit with two interfaces that share the same core pipeline: -```mermaid -graph TD - Browser -->|HTTP| FastAPI - FastAPI -->|OAuth| AdobeSign[Adobe Sign API] - FastAPI -->|OAuth| DocuSign[DocuSign API] - FastAPI -->|calls| Compose[compose_docusign_template.py] - FastAPI -->|calls| Upload[upload_docusign_template.py] - Upload -->|upsert| DocuSign - FastAPI -->|reads/writes| History[migration-output/.history.json] -``` +- **CLI** (`src/`) — shell scripts for one-off or scripted migrations +- **Web UI** (`web/`) — FastAPI + vanilla JS SPA for browser-based, multi-user migrations -**New layers:** -- `web/routers/auth.py` — browser-initiated OAuth for Adobe Sign and DocuSign -- `web/routers/templates.py` — template listing + migration status computation -- `web/routers/migrate.py` — triggers pipeline; records history -- `web/static/` — vanilla HTML/JS SPA (no build step) - -**Template issue status:** -`GET /api/templates/status` drives the Templates and Issues & Warnings pages. -Its summary status combines pre-migration validation and DocuSign composition -analysis: - -- `blockers`: validation failures that stop migration. -- `warnings`: validation warnings that allow migration but need review. -- `field_issues`: field mapping caveats emitted by composition, such as skipped - field types or unsupported conditional logic. - -The list-level "Clean" label should only appear when all three collections are -empty, so summary rows match the template detail and migration result views. - -**Idempotent Upload (v2):** -`upload_docusign_template.py` now searches for an existing DocuSign template by exact name match and updates the most recently modified one (PUT). Falls back to create (POST) if no match. `--force-create` flag bypasses upsert. +Both interfaces execute the same sequence: authenticate → download → normalize → validate → compose → upload → report. --- -*Update as architecture/requirements change. Generated by Cleo (2026-04-14). Updated 2026-04-17.* +## Component Map + +``` +Browser / CLI + │ + ▼ +┌─────────────────────────────────────────────────┐ +│ web/app.py (FastAPI) OR src/migrate_*.py │ +│ – session management (web only) │ +│ – OAuth orchestration (web only) │ +│ – batch job queue (in-memory dict, web only) │ +└──────────────┬──────────────────────────────────┘ + │ calls + ┌──────────┴──────────┐ + ▼ ▼ +src/adobe_api.py src/upload_docusign_template.py +(Adobe Sign REST) (DocuSign REST — upsert) + │ ▲ + │ raw JSON │ DocuSign JSON + ▼ │ +src/services/mapping_service.py + └─► src/models/normalized_template.py + │ NormalizedTemplate + ▼ +src/services/validation_service.py + │ blockers / warnings + ▼ +src/compose_docusign_template.py + └─► src/models/field_issue.py + │ (template_dict, warnings, field_issues) + │ + ▼ +src/reports/report_builder.py + └─► MigrationReport written to migration-output/.history.json +``` + +--- + +## Pipeline Stages + +### 1. Authentication + +| Surface | Adobe Sign | DocuSign | +|---------|-----------|---------| +| CLI | OAuth Auth Code via `adobe_auth.py`; tokens stored in `.env` | OAuth Auth Code via `docusign_auth.py`; tokens stored in `.env` | +| Web | OAuth Auth Code via `/api/auth/adobe/callback`; tokens in server-side session file | OAuth Auth Code via `/api/auth/docusign/callback`; tokens in server-side session file | + +The web UI never stores OAuth tokens in `.env` — each browser session carries its own tokens in a signed server-side session file under `.session-store/`. Sessions are identified by a cookie (`session_id`) signed with `SESSION_SECRET_KEY`. + +### 2. Download (Adobe Sign) + +`src/adobe_api.py` fetches from the Adobe Sign REST v6 API. Shard is configured via `ADOBE_SIGN_BASE_URL` (default: `https://api.eu2.adobesign.com/api/rest/v6`). + +For each template, three artifacts are written to `downloads/__/`: + +| File | Content | +|------|---------| +| `metadata.json` | Template metadata (name, status, creator, dates) | +| `form_fields.json` | Full form field list with locations, conditions, validations | +| `documents.json` | Document list metadata | +| `.pdf` | Binary PDF (base64 decoded) | + +### 3. Normalize (`mapping_service.py`) + +`MappingService.from_folder(path)` reads the three JSON files and produces a `NormalizedTemplate` (Pydantic model). This platform-agnostic intermediate schema decouples Adobe-specific field names from the DocuSign composition step. + +Key transformations at this stage: +- Participant sets → typed role list (`SIGN`, `APPROVE`, `CC`) +- Field locations expanded into flat list (multi-location fields produce N entries) +- Conditional action references converted to normalized `ConditionalRule` objects + +### 4. Validate (`validation_service.py`) + +Runs pre-migration checks and returns `(blockers: list[str], warnings: list[str])`. + +| Check | Result on failure | +|-------|-----------------| +| No recipients | Blocker | +| No documents | Blocker | +| No signature fields | Warning | +| Unassigned fields | Warning | +| Unsupported feature detected | Warning | + +Blockers halt migration. Warnings are stored in the history and surfaced in the UI but do not stop the pipeline. + +### 5. Compose (`compose_docusign_template.py`) + +Converts `NormalizedTemplate` → DocuSign `envelopeTemplate` JSON. Returns a 3-tuple: + +```python +(template_dict: dict, warnings: list[str], field_issues: list[dict]) +``` + +`field_issues` are structured `FieldIssue` objects (see `src/models/field_issue.py`) emitted when a field migrates successfully but something was silently dropped or approximated. Each issue has a machine-readable `code` (e.g. `CROSS_RECIPIENT_CONDITIONAL`, `HIDE_ACTION`, `FIELD_TYPE_SKIPPED`). See [field-mapping.md](../field-mapping.md) for the full list. + +### 6. Upload (`upload_docusign_template.py`) + +Upsert pattern: +1. Search DocuSign for an existing template with the same name +2. If found: `PUT /templates/{id}` (update the most recently modified match) +3. If not found: `POST /templates` (create new) +4. `--force-create` flag bypasses the search and always creates + +### 7. Report (`report_builder.py`) + +A `MigrationReport` is built per template and appended to `migration-output/.history.json`. Each record contains: +- template name, Adobe ID, DocuSign ID +- status (`success`, `dry_run`, `skipped`, `error`) +- blockers, warnings, field_issues +- PDF checksum (SHA-256) +- timestamp + +--- + +## Web Layer + +### FastAPI App (`web/app.py`) + +- Mounts all routers under `/api/` +- Serves the SPA shell from `web/static/index.html` +- Installs `SanitizingFilter` on the root logger at startup (redacts tokens and secrets from all log output) +- Logs a warning at startup if `SESSION_SECRET_KEY` is the default development value + +### Routers + +| Router | Prefix | Responsibility | +|--------|--------|---------------| +| `auth.py` | `/api/auth` | Adobe Sign + DocuSign OAuth flows, session status | +| `templates.py` | `/api/templates` | Adobe template listing; migration status per template | +| `migrate.py` | `/api/migrate` | Single and batch migration; history; job polling | +| `verify.py` | `/api/verify` | Send test envelopes; poll status; void | +| `audit.py` | `/api/audit` | Audit log access + CSV export | +| `admin.py` | `/api/admin` | Admin-only operations (admin_emails gating) | + +### Session Lifecycle + +``` +Browser makes first request + → middleware generates UUID session_id + → signed cookie set (itsdangerous, SESSION_SECRET_KEY) + → session file created at .session-store/.json + +User connects Adobe Sign / DocuSign + → OAuth tokens written to session file (never to .env) + → session file updated on every token refresh + +User disconnects or session file deleted + → next request gets a fresh session_id and new file + → old file can be deleted manually to force re-auth +``` + +Session files are plain JSON. Delete all files in `.session-store/` to reset all user sessions. Set `SESSION_STORE_DIR` in `.env` to change the location. + +### Multi-Account DocuSign Support + +When a DocuSign user belongs to multiple accounts, the web UI: +1. Fetches `/oauth/userinfo` after the OAuth callback +2. Sorts available accounts alphabetically +3. Prompts the user to pick one account for the session +4. Stores `docusign_account_id` in the session alongside the tokens + +### Batch Job State + +Batch migrations are tracked in an in-memory dict (`_batch_jobs`) in `web/routers/migrate.py`. Job state is lost on server restart — any in-flight batch becomes unrecoverable. This is a known limitation appropriate for single-operator deployments. Production deployments requiring durability should persist job state to a database or file store. + +### Audit Log + +`web/audit.py` writes one JSONL record per migration event to `AUDIT_LOG_FILE` (default: `.audit-log.jsonl`). Each record: + +```json +{ + "timestamp": "2026-04-23T12:00:00Z", + "session_id": "abc123", + "user_email": "user@example.com", + "action": "migrate", + "template_name": "Sales Agreement", + "adobe_template_id": "3AAA...", + "docusign_template_id": "uuid", + "status": "success", + "field_issues_count": 2, + "pdf_checksum": "sha256:abcdef..." +} +``` + +The `/api/audit` endpoints expose this log with filtering and CSV export. Sensitive fields (tokens, secrets) are never written — the `SanitizingFilter` on the root logger ensures they are redacted before hitting any output. + +--- + +## Frontend SPA + +Single-page app in `web/static/`. No build step — plain HTML + ES modules. + +| File | Responsibility | +|------|---------------| +| `index.html` | Shell, left nav, top bar, router outlet | +| `js/router.js` | Hash-based routing (`#/templates`, `#/results`, etc.) | +| `js/state.js` | Global pub/sub state store | +| `js/api.js` | Typed fetch wrappers for all backend endpoints | +| `js/auth.js` | Auth chip UI, OAuth flow, toast notifications | +| `js/templates.js` | Templates view + detail tabs (overview / issues / history) | +| `js/migration.js` | Migration modal, progress polling, results view | +| `js/issues.js` | Issues & Warnings view | +| `js/verification.js` | Verification view (send / poll / void envelopes) | +| `js/history.js` | History & Audit view | +| `js/settings.js` | Settings view | +| `js/project.js` | Per-customer project context (localStorage) | +| `js/utils.js` | `escHtml`, `formatDate`, `renderFieldIssues`, etc. | + +CSS uses DocuSign 2024 brand design tokens defined in `css/tokens.css`. + +--- + +## Security Design + +| Concern | Mechanism | +|---------|----------| +| Token leakage in logs | `SanitizingFilter` installed on root logger at startup; redacts Bearer tokens, JWTs, long base64 strings, and key=value assignments for known secret fields | +| Session integrity | Sessions signed with `SESSION_SECRET_KEY` via `itsdangerous`; secret must be set in `.env` | +| Secret exposure at startup | Warning logged if `SESSION_SECRET_KEY` is the default value | +| PDF integrity | SHA-256 checksum computed before upload and stored in history | +| Credential storage | OAuth tokens stored in server-side session files, never in browser localStorage or logs | + +--- + +## Utilities + +### `src/utils/retry.py` + +`retry_with_backoff` and `async_retry_with_backoff` decorators implement exponential backoff (configurable max retries, base delay, max delay). They target HTTP 429 / 5xx transient errors. These decorators are defined and tested but are not yet applied to API call sites — adding `@retry_with_backoff()` to functions in `adobe_api.py` and `upload_docusign_template.py` is the recommended next step for production hardening. + +### `src/utils/log_sanitizer.py` + +`install_sanitizing_filter()` attaches a `logging.Filter` to the root logger. The filter runs `redact()` on every log record's message and args, replacing Bearer tokens, JWTs, long base64 strings, and key=value secret assignments with `[REDACTED]`. + +--- + +## Known Limitations + +| Limitation | Impact | Mitigation | +|-----------|--------|-----------| +| Batch job state is in-memory | Lost on restart | Acceptable for CLI/single-operator; add DB persistence for multi-operator prod | +| Adobe shard configured via full base URL only | Changing shard requires `.env` update | Set `ADOBE_SIGN_BASE_URL` in `.env` | +| Retry decorators not applied to API calls | 429/5xx errors propagate immediately | Apply `@retry_with_backoff()` to `adobe_api.py` + `upload_docusign_template.py` | +| Regression tests require real fixture data | CI cannot run regression tests without downloaded templates | Check in anonymised fixtures or generate synthetic ones | + +*Updated 2026-04-23 — reflects v2 web UI, session lifecycle, audit log schema, multi-account support, batch job state, security design.* diff --git a/field-mapping.md b/field-mapping.md index e97b946..bcecc82 100644 --- a/field-mapping.md +++ b/field-mapping.md @@ -80,6 +80,32 @@ Tab types that do not merge (only first location used or handled specially): `radioGroupTabs` — each location is one radio button within the group `signerAttachmentTabs` — each location is an independent attachment request +## Multi-Document Templates + +Adobe Sign library documents can contain multiple documents (PDFs) stacked into one template. DocuSign templates also support multiple documents — each document gets a unique `documentId` starting from 1. + +### How it works + +The compose pipeline assigns a `documentId` to each document in the order returned by the Adobe Sign `documents.json` list. All form fields reference their page position within the document they belong to (`pageNumber` is 1-based within the document's own page sequence, not the overall template page count). + +``` +Adobe Sign template with 2 docs: + doc[0]: "Contract.pdf" (3 pages) → documentId: 1 + doc[1]: "Exhibit-A.pdf" (2 pages) → documentId: 2 + +A field on page 2 of Exhibit-A.pdf: + adobe_location.pageNumber = 2 (within the exhibit) + compose emits: documentId=2, pageNumber=2 +``` + +DocuSign uses `(documentId, pageNumber)` together to locate every tab. If only one document exists, `documentId` is always `1`. + +### Known limitation + +Adobe Sign form fields store `pageNumber` as a sequential page number across the **entire** template (all documents concatenated). If a template has two 3-page documents, fields on document 2 have `pageNumber` 4–6. The compose pipeline does not currently rebase page numbers per document — it passes Adobe's page numbers through as-is and sets `documentId` based on field assignment. + +**Impact**: For single-document templates this is correct. For multi-document templates, verify field placement visually in DocuSign after migration if the template spans more than one PDF. + ## Conditional Logic Mapping Adobe Sign `conditionalAction` → DocuSign `conditionalParentLabel` + `conditionalParentValue` on the dependent tab.