adobe-to-docusign-migrator/docs/architecture.md

12 KiB
Raw Permalink Blame History

Architecture & Design — Adobe Sign → DocuSign Migrator

Last updated: 2026-04-23


System Overview

The migrator is a Python toolkit with two interfaces that share the same core pipeline:

  • CLI (src/) — shell scripts for one-off or scripted migrations
  • Web UI (web/) — FastAPI + vanilla JS SPA for browser-based, multi-user migrations

Both interfaces execute the same sequence: authenticate → download → normalize → validate → compose → upload → report.


Component Map

Browser / CLI
     │
     ▼
┌─────────────────────────────────────────────────┐
│  web/app.py  (FastAPI)  OR  src/migrate_*.py    │
│   session management (web only)                │
│   OAuth orchestration (web only)               │
│   batch job queue (in-memory dict, web only)   │
└──────────────┬──────────────────────────────────┘
               │ calls
    ┌──────────┴──────────┐
    ▼                     ▼
src/adobe_api.py      src/upload_docusign_template.py
(Adobe Sign REST)     (DocuSign REST — upsert)
    │                     ▲
    │ raw JSON             │ DocuSign JSON
    ▼                     │
src/services/mapping_service.py
  └─► src/models/normalized_template.py
          │ NormalizedTemplate
          ▼
src/services/validation_service.py
          │ blockers / warnings
          ▼
src/compose_docusign_template.py
  └─► src/models/field_issue.py
          │ (template_dict, warnings, field_issues)
          │
          ▼
src/reports/report_builder.py
  └─► MigrationReport written to migration-output/.history.json

Pipeline Stages

1. Authentication

Surface Adobe Sign DocuSign
CLI OAuth Auth Code via adobe_auth.py; tokens stored in .env OAuth Auth Code via docusign_auth.py; tokens stored in .env
Web OAuth Auth Code via /api/auth/adobe/callback; tokens in server-side session file OAuth Auth Code via /api/auth/docusign/callback; tokens in server-side session file

The web UI never stores OAuth tokens in .env — each browser session carries its own tokens in a signed server-side session file under .session-store/. Sessions are identified by a cookie (session_id) signed with SESSION_SECRET_KEY.

2. Download (Adobe Sign)

src/adobe_api.py fetches from the Adobe Sign REST v6 API. Shard is configured via ADOBE_SIGN_BASE_URL (default: https://api.eu2.adobesign.com/api/rest/v6).

For each template, three artifacts are written to downloads/<template-name>__<id>/:

File Content
metadata.json Template metadata (name, status, creator, dates)
form_fields.json Full form field list with locations, conditions, validations
documents.json Document list metadata
<name>.pdf Binary PDF (base64 decoded)

3. Normalize (mapping_service.py)

MappingService.from_folder(path) reads the three JSON files and produces a NormalizedTemplate (Pydantic model). This platform-agnostic intermediate schema decouples Adobe-specific field names from the DocuSign composition step.

Key transformations at this stage:

  • Participant sets → typed role list (SIGN, APPROVE, CC)
  • Field locations expanded into flat list (multi-location fields produce N entries)
  • Conditional action references converted to normalized ConditionalRule objects

4. Validate (validation_service.py)

Runs pre-migration checks and returns (blockers: list[str], warnings: list[str]).

Check Result on failure
No recipients Blocker
No documents Blocker
No signature fields Warning
Unassigned fields Warning
Unsupported feature detected Warning

Blockers halt migration. Warnings are stored in the history and surfaced in the UI but do not stop the pipeline.

5. Compose (compose_docusign_template.py)

Converts NormalizedTemplate → DocuSign envelopeTemplate JSON. Returns a 3-tuple:

(template_dict: dict, warnings: list[str], field_issues: list[dict])

field_issues are structured FieldIssue objects (see src/models/field_issue.py) emitted when a field migrates successfully but something was silently dropped or approximated. Each issue has a machine-readable code (e.g. CROSS_RECIPIENT_CONDITIONAL, HIDE_ACTION, FIELD_TYPE_SKIPPED). See field-mapping.md for the full list.

6. Upload (upload_docusign_template.py)

Upsert pattern:

  1. Search DocuSign for an existing template with the same name
  2. If found: PUT /templates/{id} (update the most recently modified match)
  3. If not found: POST /templates (create new)
  4. --force-create flag bypasses the search and always creates

7. Report (report_builder.py)

A MigrationReport is built per template and appended to migration-output/.history.json. Each record contains:

  • template name, Adobe ID, DocuSign ID
  • status (success, dry_run, skipped, error)
  • blockers, warnings, field_issues
  • PDF checksum (SHA-256)
  • timestamp

Web Layer

FastAPI App (web/app.py)

  • Mounts all routers under /api/
  • Serves the SPA shell from web/static/index.html
  • Installs SanitizingFilter on the root logger at startup (redacts tokens and secrets from all log output)
  • Logs a warning at startup if SESSION_SECRET_KEY is the default development value

Routers

Router Prefix Responsibility
auth.py /api/auth Adobe Sign + DocuSign OAuth flows, session status
templates.py /api/templates Adobe template listing; migration status per template
migrate.py /api/migrate Single and batch migration; history; job polling
verify.py /api/verify Send test envelopes; poll status; void
audit.py /api/audit Audit log access + CSV export
admin.py /api/admin Admin-only operations (admin_emails gating)

Session Lifecycle

Browser makes first request
  → middleware generates UUID session_id
  → signed cookie set (itsdangerous, SESSION_SECRET_KEY)
  → session file created at .session-store/<session_id>.json

User connects Adobe Sign / DocuSign
  → OAuth tokens written to session file (never to .env)
  → session file updated on every token refresh

User disconnects or session file deleted
  → next request gets a fresh session_id and new file
  → old file can be deleted manually to force re-auth

Session files are plain JSON. Delete all files in .session-store/ to reset all user sessions. Set SESSION_STORE_DIR in .env to change the location.

Multi-Account DocuSign Support

When a DocuSign user belongs to multiple accounts, the web UI:

  1. Fetches /oauth/userinfo after the OAuth callback
  2. Sorts available accounts alphabetically
  3. Prompts the user to pick one account for the session
  4. Stores docusign_account_id in the session alongside the tokens

Batch Job State

Batch migrations are tracked in an in-memory dict (_batch_jobs) in web/routers/migrate.py. Job state is lost on server restart — any in-flight batch becomes unrecoverable. This is a known limitation appropriate for single-operator deployments. Production deployments requiring durability should persist job state to a database or file store.

Audit Log

web/audit.py writes one JSONL record per migration event to AUDIT_LOG_FILE (default: .audit-log.jsonl). Each record:

{
  "timestamp": "2026-04-23T12:00:00Z",
  "session_id": "abc123",
  "user_email": "user@example.com",
  "action": "migrate",
  "template_name": "Sales Agreement",
  "adobe_template_id": "3AAA...",
  "docusign_template_id": "uuid",
  "status": "success",
  "field_issues_count": 2,
  "pdf_checksum": "sha256:abcdef..."
}

The /api/audit endpoints expose this log with filtering and CSV export. Sensitive fields (tokens, secrets) are never written — the SanitizingFilter on the root logger ensures they are redacted before hitting any output.


Frontend SPA

Single-page app in web/static/. No build step — plain HTML + ES modules.

File Responsibility
index.html Shell, left nav, top bar, router outlet
js/router.js Hash-based routing (#/templates, #/results, etc.)
js/state.js Global pub/sub state store
js/api.js Typed fetch wrappers for all backend endpoints
js/auth.js Auth chip UI, OAuth flow, toast notifications
js/templates.js Templates view + detail tabs (overview / issues / history)
js/migration.js Migration modal, progress polling, results view
js/issues.js Issues & Warnings view
js/verification.js Verification view (send / poll / void envelopes)
js/history.js History & Audit view
js/settings.js Settings view
js/project.js Per-customer project context (localStorage)
js/utils.js escHtml, formatDate, renderFieldIssues, etc.

CSS uses DocuSign 2024 brand design tokens defined in css/tokens.css.

Template Issue Summary

The Templates and Issues & Warnings pages use /api/templates/status. A template is shown as Clean only when all of these are empty:

  • validation blockers
  • validation warnings
  • composition field_issues

On the web server, migration downloads are temporary. If no persistent downloads/ folder exists for re-analysis, /api/templates/status falls back to the current browser session's migration-output/.history.json records so field issues discovered during migration still appear in the Templates summary.


Security Design

Concern Mechanism
Token leakage in logs SanitizingFilter installed on root logger at startup; redacts Bearer tokens, JWTs, long base64 strings, and key=value assignments for known secret fields
Session integrity Sessions signed with SESSION_SECRET_KEY via itsdangerous; secret must be set in .env
Secret exposure at startup Warning logged if SESSION_SECRET_KEY is the default value
PDF integrity SHA-256 checksum computed before upload and stored in history
Credential storage OAuth tokens stored in server-side session files, never in browser localStorage or logs

Utilities

src/utils/retry.py

retry_with_backoff and async_retry_with_backoff decorators implement exponential backoff (configurable max retries, base delay, max delay). They target HTTP 429 / 5xx transient errors. These decorators are defined and tested but are not yet applied to API call sites — adding @retry_with_backoff() to functions in adobe_api.py and upload_docusign_template.py is the recommended next step for production hardening.

src/utils/log_sanitizer.py

install_sanitizing_filter() attaches a logging.Filter to the root logger. The filter runs redact() on every log record's message and args, replacing Bearer tokens, JWTs, long base64 strings, and key=value secret assignments with [REDACTED].


Known Limitations

Limitation Impact Mitigation
Batch job state is in-memory Lost on restart Acceptable for CLI/single-operator; add DB persistence for multi-operator prod
Adobe shard configured via full base URL only Changing shard requires .env update Set ADOBE_SIGN_BASE_URL in .env
Retry decorators not applied to API calls 429/5xx errors propagate immediately Apply @retry_with_backoff() to adobe_api.py + upload_docusign_template.py
Regression tests require real fixture data CI cannot run regression tests without downloaded templates Check in anonymised fixtures or generate synthetic ones

Updated 2026-04-23 — reflects v2 web UI, session lifecycle, audit log schema, multi-account support, batch job state, security design.