recipe-manager/.harness/image-import-hardening-exec...

9.1 KiB

Recipe Manager — Image Import & Backend Hardening Execution Board

Created: 2026-03-27
Owner: Main Orchestrator
Status: READY

Objective

Ship a reliable URL import pipeline that returns high-quality image_url values and harden backend safety/perf around migrations, search, and TypeScript correctness.

Current Repo Reality (Baseline)

  • POST /api/import/url is currently a stub returning { draft_recipe: null } (src/backend/routes/import.ts).
  • URL fetch + JSON-LD extraction foundation exists (src/backend/services/UrlImportService.ts) but is not wired to route.
  • Schema.org/heuristic parser services are minimal and weakly typed (any heavy).
  • image_url exists in schema/repository/routes/seed (schema.sql, RecipeRepository, recipes.ts), but no explicit image validation/fallback policy.
  • Runtime migration helper exists (applyRuntimeMigrations) with no direct migration tests.
  • Search currently uses %LIKE% joins across recipes/ingredients/tags; indexes are limited and query plan unverified.
  • TS strict mode is on, but any and contract drift remain in backend/frontend types.

Release Gate (Done Definition)

  • /api/import/url performs real fetch/parse flow and returns non-empty draft_recipe for supported pages.
  • Image extraction ranking + URL validation policy is implemented and tested.
  • Runtime migration behavior is covered by tests for pre/post image_url schemas.
  • Search performance/query behavior validated with index and query-shape improvements.
  • TS hygiene pass removes prioritized unsafe any in import/search pathways.

Task Backlog

T01 — Import Route Orchestration (Wire Real Pipeline)

Priority: P0
Owner: agent-import-core
Dependencies: none

Deliverables:

  • Replace stub in src/backend/routes/import.ts with real pipeline:
    • validate URL input via zod
    • call UrlImportService.fetchFromUrl
    • parse JSON-LD blocks and select recipe candidate
    • fallback to heuristic parser when schema parse fails
    • return structured UrlImportResult payload expected by frontend (draft_recipe, source_url, parse metadata)
  • Error mapping from UrlImportError to stable HTTP/API errors for UI (timeout, network, unsupported content, parse failure).

Acceptance:

  • Import route no longer returns hardcoded null draft.
  • src/backend/tests/import.test.ts includes success + invalid URL + timeout/network/content-type failures.
  • Frontend import page can reach “review” stage from at least one fixture HTML sample.

T02 — Schema.org Image Extraction Quality Pass

Priority: P0
Owner: agent-import-parser
Dependencies: T01

Deliverables:

  • Refactor SchemaOrgRecipeParserService to support common image variants:
    • string URL
    • array of URLs
    • array/object entries with url, contentUrl, thumbnailUrl
    • @graph recipe object selection when JSON-LD block is graph-shaped
  • Add image candidate ranking (prefer HTTPS, largest/default image over tiny thumbnails when width/height available).
  • Return normalized draft with stable image_url candidate.

Acceptance:

  • Parser tests added with fixtures for at least 5 JSON-LD image shapes.
  • For each fixture, expected top image_url is asserted.
  • No parser path throws on malformed/partial image fields; degrades gracefully.

T03 — Image URL Validation & Fallback Policy

Priority: P0
Owner: agent-import-hardening
Dependencies: T02

Deliverables:

  • Implement central image URL sanitizer/validator utility (new service module):
    • allow http/https only (or strict https with optional downgrade rule)
    • reject data:, javascript:, blob/non-web URLs
    • trim + normalize empty values to null
    • optional host allow/deny controls (documented default policy)
  • Integrate validator in:
    • import parsing output
    • recipe create/update flows (routes/recipes.ts + repository normalization layer)
  • Define fallback order for import draft image:
    1. validated schema.org image
    2. validated heuristic image
    3. null (no placeholder persisted)

Acceptance:

  • Unit tests cover allow/reject matrix for image URLs.
  • Creating/updating recipe with invalid image URL is predictably rejected or nulled per policy.
  • Policy documented in docs/api.md (import + recipe payload behavior).

T04 — Migration Coverage for recipes.image_url

Priority: P1
Owner: agent-db-safety
Dependencies: none (can run parallel to T02/T03)

Deliverables:

  • Add migration tests for applyRuntimeMigrations:
    • DB with no recipes table (no-op)
    • DB with recipes but no image_url (column added)
    • DB already containing image_url (idempotent)
  • Add integration check around migrate.ts/database.ts startup path to ensure migration executes before repository usage.

Acceptance:

  • Dedicated test file exists under src/backend/tests (or src/backend/db/__tests__).
  • Tests assert column presence via PRAGMA table_info(recipes).
  • Re-running migration tests proves idempotence.

T05 — Import Contract Alignment (Frontend/Backend Types)

Priority: P1
Owner: agent-contracts
Dependencies: T01

Deliverables:

  • Define shared import response contract (backend + frontend) for:
    • draft_recipe
    • source_url
    • parse provenance (schema_org_used, heuristic_used, warning list)
  • Align frontend UrlImportResult and backend route payload to avoid required-field mismatch.
  • Ensure image_url is represented consistently in frontend recipe interfaces where used by UI components.

Acceptance:

  • TypeScript build/test passes without contract casts for import payload.
  • Import UI displays parse metadata without runtime undefined errors.
  • frontend/src/types/* and backend response shape are synchronized.

T06 — Search Query + Index Optimization

Priority: P1
Owner: agent-data-perf
Dependencies: none

Deliverables:

  • Review/adjust recipe search query shape in RecipeRepository.findAll/count to reduce costly wide joins where possible.
  • Add/validate indexes to support current search/filter path (evaluate at minimum):
    • recipes(created_at) for default order
    • recipe_tags(recipe_id) complementing existing recipe_tags(tag_id)
    • optional composite/index refinements based on EXPLAIN QUERY PLAN
  • Capture query plan before/after on seeded dataset.

Acceptance:

  • Query plan evidence saved under docs/perf/search-query-plan.md.
  • Search + tag filter behavior unchanged functionally (existing tests still pass).
  • Measured improvement or justified no-op documented.

T07 — TS Hygiene Pass (Import/Data Path First)

Priority: P2
Owner: agent-ts-hygiene
Dependencies: T01, T05

Deliverables:

  • Remove high-risk any from import/parser/repository hot paths:
    • SchemaOrgRecipeParserService
    • RecipeRepository filter destructuring
    • import-related tests/types
  • Introduce narrow helper types/guards for JSON-LD blocks instead of raw any.
  • Keep broader orchestrator generics untouched unless directly impacted.

Acceptance:

  • No any remains in import route + parser services.
  • Repository filter casts no longer use filters as any.
  • npm run build passes with strict mode unchanged.

T08 — Docs + Runbook Truth Sync

Priority: P2
Owner: agent-docs-sync
Dependencies: T01, T03, T05

Deliverables:

  • Update stale API/import docs to match actual payloads and behavior (docs/api.md, relevant README sections).
  • Document known import limits and expected failure messages.
  • Add “how to add parser fixture” note for future maintainers.

Acceptance:

  • Docs describe real import endpoint behavior (not aspirational).
  • Example request/response includes image_url handling rules.
  • No references to removed/incorrect field names in import examples.

Wave Plan (Execution Order)

Wave 1 — Restore Functional Import Core

  • T01 (route orchestration)

Wave 2 — Image Quality + Safety

  • T02 (schema image extraction)
  • T03 (validation/fallback policy)
  • T05 (contract alignment)

Wave 3 — Backend Hardening in Parallel

  • T04 (migration coverage)
  • T06 (search/index optimization)

Wave 4 — Cleanup + Stability

  • T07 (TS hygiene)
  • T08 (docs truth sync)

Dependency Notes

  • T03 depends on T02 because sanitizer policy should apply to ranked image candidates.
  • T05 depends on T01 because real response shape must exist before contract lock.
  • T07 should start after T01/T05 to avoid churn from contract refactors.
  1. T01 — unblock real import behavior and expose true integration issues.
  2. T02 — improve image extraction quality where import currently underperforms.
  3. T03 — enforce URL safety/normalization policy.
  4. T05 — lock backend/frontend import contract and image fields.
  5. Parallel: T04 + T06.
  6. T07 then T08.

First Task to Launch

Launch T01 — Import Route Orchestration (Wire Real Pipeline).
Reason: it converts a stubbed endpoint into executable behavior and creates the integration baseline needed by every downstream image-quality and hardening task.

Reporting Protocol (for each task)

  1. task id
  2. files changed
  3. tests added/updated + command output
  4. blockers/risks
  5. ready-for-review flag