adobe-to-docusign-migrator/docs/architecture.md

278 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Architecture & Design — Adobe Sign → DocuSign Migrator
*Last updated: 2026-04-23*
---
## System Overview
The migrator is a Python toolkit with two interfaces that share the same core pipeline:
- **CLI** (`src/`) — shell scripts for one-off or scripted migrations
- **Web UI** (`web/`) — FastAPI + vanilla JS SPA for browser-based, multi-user migrations
Both interfaces execute the same sequence: authenticate → download → normalize → validate → compose → upload → report.
---
## Component Map
```
Browser / CLI
┌─────────────────────────────────────────────────┐
│ web/app.py (FastAPI) OR src/migrate_*.py │
session management (web only) │
OAuth orchestration (web only) │
batch job queue (in-memory dict, web only) │
└──────────────┬──────────────────────────────────┘
│ calls
┌──────────┴──────────┐
▼ ▼
src/adobe_api.py src/upload_docusign_template.py
(Adobe Sign REST) (DocuSign REST — upsert)
│ ▲
│ raw JSON │ DocuSign JSON
▼ │
src/services/mapping_service.py
└─► src/models/normalized_template.py
│ NormalizedTemplate
src/services/validation_service.py
│ blockers / warnings
src/compose_docusign_template.py
└─► src/models/field_issue.py
│ (template_dict, warnings, field_issues)
src/reports/report_builder.py
└─► MigrationReport written to migration-output/.history.json
```
---
## Pipeline Stages
### 1. Authentication
| Surface | Adobe Sign | DocuSign |
|---------|-----------|---------|
| CLI | OAuth Auth Code via `adobe_auth.py`; tokens stored in `.env` | OAuth Auth Code via `docusign_auth.py`; tokens stored in `.env` |
| Web | OAuth Auth Code via `/api/auth/adobe/callback`; tokens in server-side session file | OAuth Auth Code via `/api/auth/docusign/callback`; tokens in server-side session file |
The web UI never stores OAuth tokens in `.env` — each browser session carries its own tokens in a signed server-side session file under `.session-store/`. Sessions are identified by a cookie (`session_id`) signed with `SESSION_SECRET_KEY`.
### 2. Download (Adobe Sign)
`src/adobe_api.py` fetches from the Adobe Sign REST v6 API. Shard is configured via `ADOBE_SIGN_BASE_URL` (default: `https://api.eu2.adobesign.com/api/rest/v6`).
For each template, three artifacts are written to `downloads/<template-name>__<id>/`:
| File | Content |
|------|---------|
| `metadata.json` | Template metadata (name, status, creator, dates) |
| `form_fields.json` | Full form field list with locations, conditions, validations |
| `documents.json` | Document list metadata |
| `<name>.pdf` | Binary PDF (base64 decoded) |
### 3. Normalize (`mapping_service.py`)
`MappingService.from_folder(path)` reads the three JSON files and produces a `NormalizedTemplate` (Pydantic model). This platform-agnostic intermediate schema decouples Adobe-specific field names from the DocuSign composition step.
Key transformations at this stage:
- Participant sets → typed role list (`SIGN`, `APPROVE`, `CC`)
- Field locations expanded into flat list (multi-location fields produce N entries)
- Conditional action references converted to normalized `ConditionalRule` objects
### 4. Validate (`validation_service.py`)
Runs pre-migration checks and returns `(blockers: list[str], warnings: list[str])`.
| Check | Result on failure |
|-------|-----------------|
| No recipients | Blocker |
| No documents | Blocker |
| No signature fields | Warning |
| Unassigned fields | Warning |
| Unsupported feature detected | Warning |
Blockers halt migration. Warnings are stored in the history and surfaced in the UI but do not stop the pipeline.
### 5. Compose (`compose_docusign_template.py`)
Converts `NormalizedTemplate` → DocuSign `envelopeTemplate` JSON. Returns a 3-tuple:
```python
(template_dict: dict, warnings: list[str], field_issues: list[dict])
```
`field_issues` are structured `FieldIssue` objects (see `src/models/field_issue.py`) emitted when a field migrates successfully but something was silently dropped or approximated. Each issue has a machine-readable `code` (e.g. `CROSS_RECIPIENT_CONDITIONAL`, `HIDE_ACTION`, `FIELD_TYPE_SKIPPED`). See [field-mapping.md](../field-mapping.md) for the full list.
### 6. Upload (`upload_docusign_template.py`)
Upsert pattern:
1. Search DocuSign for an existing template with the same name
2. If found: `PUT /templates/{id}` (update the most recently modified match)
3. If not found: `POST /templates` (create new)
4. `--force-create` flag bypasses the search and always creates
### 7. Report (`report_builder.py`)
A `MigrationReport` is built per template and appended to `migration-output/.history.json`. Each record contains:
- template name, Adobe ID, DocuSign ID
- status (`success`, `dry_run`, `skipped`, `error`)
- blockers, warnings, field_issues
- PDF checksum (SHA-256)
- timestamp
---
## Web Layer
### FastAPI App (`web/app.py`)
- Mounts all routers under `/api/`
- Serves the SPA shell from `web/static/index.html`
- Installs `SanitizingFilter` on the root logger at startup (redacts tokens and secrets from all log output)
- Logs a warning at startup if `SESSION_SECRET_KEY` is the default development value
### Routers
| Router | Prefix | Responsibility |
|--------|--------|---------------|
| `auth.py` | `/api/auth` | Adobe Sign + DocuSign OAuth flows, session status |
| `templates.py` | `/api/templates` | Adobe template listing; migration status per template |
| `migrate.py` | `/api/migrate` | Single and batch migration; history; job polling |
| `verify.py` | `/api/verify` | Send test envelopes; poll status; void |
| `audit.py` | `/api/audit` | Audit log access + CSV export |
| `admin.py` | `/api/admin` | Admin-only operations (admin_emails gating) |
### Session Lifecycle
```
Browser makes first request
→ middleware generates UUID session_id
→ signed cookie set (itsdangerous, SESSION_SECRET_KEY)
→ session file created at .session-store/<session_id>.json
User connects Adobe Sign / DocuSign
→ OAuth tokens written to session file (never to .env)
→ session file updated on every token refresh
User disconnects or session file deleted
→ next request gets a fresh session_id and new file
→ old file can be deleted manually to force re-auth
```
Session files are plain JSON. Delete all files in `.session-store/` to reset all user sessions. Set `SESSION_STORE_DIR` in `.env` to change the location.
### Multi-Account DocuSign Support
When a DocuSign user belongs to multiple accounts, the web UI:
1. Fetches `/oauth/userinfo` after the OAuth callback
2. Sorts available accounts alphabetically
3. Prompts the user to pick one account for the session
4. Stores `docusign_account_id` in the session alongside the tokens
### Batch Job State
Batch migrations are tracked in an in-memory dict (`_batch_jobs`) in `web/routers/migrate.py`. Job state is lost on server restart — any in-flight batch becomes unrecoverable. This is a known limitation appropriate for single-operator deployments. Production deployments requiring durability should persist job state to a database or file store.
### Audit Log
`web/audit.py` writes one JSONL record per migration event to `AUDIT_LOG_FILE` (default: `.audit-log.jsonl`). Each record:
```json
{
"timestamp": "2026-04-23T12:00:00Z",
"session_id": "abc123",
"user_email": "user@example.com",
"action": "migrate",
"template_name": "Sales Agreement",
"adobe_template_id": "3AAA...",
"docusign_template_id": "uuid",
"status": "success",
"field_issues_count": 2,
"pdf_checksum": "sha256:abcdef..."
}
```
The `/api/audit` endpoints expose this log with filtering and CSV export. Sensitive fields (tokens, secrets) are never written — the `SanitizingFilter` on the root logger ensures they are redacted before hitting any output.
---
## Frontend SPA
Single-page app in `web/static/`. No build step — plain HTML + ES modules.
| File | Responsibility |
|------|---------------|
| `index.html` | Shell, left nav, top bar, router outlet |
| `js/router.js` | Hash-based routing (`#/templates`, `#/results`, etc.) |
| `js/state.js` | Global pub/sub state store |
| `js/api.js` | Typed fetch wrappers for all backend endpoints |
| `js/auth.js` | Auth chip UI, OAuth flow, toast notifications |
| `js/templates.js` | Templates view + detail tabs (overview / issues / history) |
| `js/migration.js` | Migration modal, progress polling, results view |
| `js/issues.js` | Issues & Warnings view |
| `js/verification.js` | Verification view (send / poll / void envelopes) |
| `js/history.js` | History & Audit view |
| `js/settings.js` | Settings view |
| `js/project.js` | Per-customer project context (localStorage) |
| `js/utils.js` | `escHtml`, `formatDate`, `renderFieldIssues`, etc. |
CSS uses DocuSign 2024 brand design tokens defined in `css/tokens.css`.
### Template Issue Summary
The Templates and Issues & Warnings pages use `/api/templates/status`. A
template is shown as `Clean` only when all of these are empty:
- validation `blockers`
- validation `warnings`
- composition `field_issues`
On the web server, migration downloads are temporary. If no persistent
`downloads/` folder exists for re-analysis, `/api/templates/status` falls back
to the current browser session's `migration-output/.history.json` records so
field issues discovered during migration still appear in the Templates summary.
---
## Security Design
| Concern | Mechanism |
|---------|----------|
| Token leakage in logs | `SanitizingFilter` installed on root logger at startup; redacts Bearer tokens, JWTs, long base64 strings, and key=value assignments for known secret fields |
| Session integrity | Sessions signed with `SESSION_SECRET_KEY` via `itsdangerous`; secret must be set in `.env` |
| Secret exposure at startup | Warning logged if `SESSION_SECRET_KEY` is the default value |
| PDF integrity | SHA-256 checksum computed before upload and stored in history |
| Credential storage | OAuth tokens stored in server-side session files, never in browser localStorage or logs |
---
## Utilities
### `src/utils/retry.py`
`retry_with_backoff` and `async_retry_with_backoff` decorators implement exponential backoff (configurable max retries, base delay, max delay). They target HTTP 429 / 5xx transient errors. These decorators are defined and tested but are not yet applied to API call sites — adding `@retry_with_backoff()` to functions in `adobe_api.py` and `upload_docusign_template.py` is the recommended next step for production hardening.
### `src/utils/log_sanitizer.py`
`install_sanitizing_filter()` attaches a `logging.Filter` to the root logger. The filter runs `redact()` on every log record's message and args, replacing Bearer tokens, JWTs, long base64 strings, and key=value secret assignments with `[REDACTED]`.
---
## Known Limitations
| Limitation | Impact | Mitigation |
|-----------|--------|-----------|
| Batch job state is in-memory | Lost on restart | Acceptable for CLI/single-operator; add DB persistence for multi-operator prod |
| Adobe shard configured via full base URL only | Changing shard requires `.env` update | Set `ADOBE_SIGN_BASE_URL` in `.env` |
| Retry decorators not applied to API calls | 429/5xx errors propagate immediately | Apply `@retry_with_backoff()` to `adobe_api.py` + `upload_docusign_template.py` |
| Regression tests require real fixture data | CI cannot run regression tests without downloaded templates | Check in anonymised fixtures or generate synthetic ones |
*Updated 2026-04-23 — reflects v2 web UI, session lifecycle, audit log schema, multi-account support, batch job state, security design.*