recipe-manager/.harness/image-import-hardening-exec...

235 lines
9.1 KiB
Markdown

# Recipe Manager — Image Import & Backend Hardening Execution Board
Created: 2026-03-27
Owner: Main Orchestrator
Status: READY
## Objective
Ship a reliable URL import pipeline that returns high-quality `image_url` values and harden backend safety/perf around migrations, search, and TypeScript correctness.
## Current Repo Reality (Baseline)
- `POST /api/import/url` is currently a stub returning `{ draft_recipe: null }` (`src/backend/routes/import.ts`).
- URL fetch + JSON-LD extraction foundation exists (`src/backend/services/UrlImportService.ts`) but is not wired to route.
- Schema.org/heuristic parser services are minimal and weakly typed (`any` heavy).
- `image_url` exists in schema/repository/routes/seed (`schema.sql`, `RecipeRepository`, `recipes.ts`), but no explicit image validation/fallback policy.
- Runtime migration helper exists (`applyRuntimeMigrations`) with no direct migration tests.
- Search currently uses `%LIKE%` joins across recipes/ingredients/tags; indexes are limited and query plan unverified.
- TS strict mode is on, but `any` and contract drift remain in backend/frontend types.
## Release Gate (Done Definition)
- [ ] `/api/import/url` performs real fetch/parse flow and returns non-empty `draft_recipe` for supported pages.
- [ ] Image extraction ranking + URL validation policy is implemented and tested.
- [ ] Runtime migration behavior is covered by tests for pre/post `image_url` schemas.
- [ ] Search performance/query behavior validated with index and query-shape improvements.
- [ ] TS hygiene pass removes prioritized unsafe `any` in import/search pathways.
---
## Task Backlog
### T01 — Import Route Orchestration (Wire Real Pipeline)
Priority: P0
Owner: agent-import-core
Dependencies: none
Deliverables:
- Replace stub in `src/backend/routes/import.ts` with real pipeline:
- validate URL input via zod
- call `UrlImportService.fetchFromUrl`
- parse JSON-LD blocks and select recipe candidate
- fallback to heuristic parser when schema parse fails
- return structured `UrlImportResult` payload expected by frontend (`draft_recipe`, `source_url`, parse metadata)
- Error mapping from `UrlImportError` to stable HTTP/API errors for UI (`timeout`, `network`, `unsupported content`, parse failure).
Acceptance:
- [ ] Import route no longer returns hardcoded null draft.
- [ ] `src/backend/tests/import.test.ts` includes success + invalid URL + timeout/network/content-type failures.
- [ ] Frontend import page can reach “review” stage from at least one fixture HTML sample.
---
### T02 — Schema.org Image Extraction Quality Pass
Priority: P0
Owner: agent-import-parser
Dependencies: T01
Deliverables:
- Refactor `SchemaOrgRecipeParserService` to support common image variants:
- string URL
- array of URLs
- array/object entries with `url`, `contentUrl`, `thumbnailUrl`
- `@graph` recipe object selection when JSON-LD block is graph-shaped
- Add image candidate ranking (prefer HTTPS, largest/default image over tiny thumbnails when width/height available).
- Return normalized draft with stable `image_url` candidate.
Acceptance:
- [ ] Parser tests added with fixtures for at least 5 JSON-LD image shapes.
- [ ] For each fixture, expected top `image_url` is asserted.
- [ ] No parser path throws on malformed/partial image fields; degrades gracefully.
---
### T03 — Image URL Validation & Fallback Policy
Priority: P0
Owner: agent-import-hardening
Dependencies: T02
Deliverables:
- Implement central image URL sanitizer/validator utility (new service module):
- allow `http/https` only (or strict `https` with optional downgrade rule)
- reject `data:`, `javascript:`, blob/non-web URLs
- trim + normalize empty values to `null`
- optional host allow/deny controls (documented default policy)
- Integrate validator in:
- import parsing output
- recipe create/update flows (`routes/recipes.ts` + repository normalization layer)
- Define fallback order for import draft image:
1) validated schema.org image
2) validated heuristic image
3) null (no placeholder persisted)
Acceptance:
- [ ] Unit tests cover allow/reject matrix for image URLs.
- [ ] Creating/updating recipe with invalid image URL is predictably rejected or nulled per policy.
- [ ] Policy documented in `docs/api.md` (import + recipe payload behavior).
---
### T04 — Migration Coverage for `recipes.image_url`
Priority: P1
Owner: agent-db-safety
Dependencies: none (can run parallel to T02/T03)
Deliverables:
- Add migration tests for `applyRuntimeMigrations`:
- DB with no `recipes` table (no-op)
- DB with `recipes` but no `image_url` (column added)
- DB already containing `image_url` (idempotent)
- Add integration check around `migrate.ts`/`database.ts` startup path to ensure migration executes before repository usage.
Acceptance:
- [ ] Dedicated test file exists under `src/backend/tests` (or `src/backend/db/__tests__`).
- [ ] Tests assert column presence via `PRAGMA table_info(recipes)`.
- [ ] Re-running migration tests proves idempotence.
---
### T05 — Import Contract Alignment (Frontend/Backend Types)
Priority: P1
Owner: agent-contracts
Dependencies: T01
Deliverables:
- Define shared import response contract (backend + frontend) for:
- `draft_recipe`
- `source_url`
- parse provenance (`schema_org_used`, `heuristic_used`, warning list)
- Align frontend `UrlImportResult` and backend route payload to avoid required-field mismatch.
- Ensure `image_url` is represented consistently in frontend recipe interfaces where used by UI components.
Acceptance:
- [ ] TypeScript build/test passes without contract casts for import payload.
- [ ] Import UI displays parse metadata without runtime undefined errors.
- [ ] `frontend/src/types/*` and backend response shape are synchronized.
---
### T06 — Search Query + Index Optimization
Priority: P1
Owner: agent-data-perf
Dependencies: none
Deliverables:
- Review/adjust recipe search query shape in `RecipeRepository.findAll/count` to reduce costly wide joins where possible.
- Add/validate indexes to support current search/filter path (evaluate at minimum):
- `recipes(created_at)` for default order
- `recipe_tags(recipe_id)` complementing existing `recipe_tags(tag_id)`
- optional composite/index refinements based on `EXPLAIN QUERY PLAN`
- Capture query plan before/after on seeded dataset.
Acceptance:
- [ ] Query plan evidence saved under `docs/perf/search-query-plan.md`.
- [ ] Search + tag filter behavior unchanged functionally (existing tests still pass).
- [ ] Measured improvement or justified no-op documented.
---
### T07 — TS Hygiene Pass (Import/Data Path First)
Priority: P2
Owner: agent-ts-hygiene
Dependencies: T01, T05
Deliverables:
- Remove high-risk `any` from import/parser/repository hot paths:
- `SchemaOrgRecipeParserService`
- `RecipeRepository` filter destructuring
- import-related tests/types
- Introduce narrow helper types/guards for JSON-LD blocks instead of raw `any`.
- Keep broader orchestrator generics untouched unless directly impacted.
Acceptance:
- [ ] No `any` remains in import route + parser services.
- [ ] Repository filter casts no longer use `filters as any`.
- [ ] `npm run build` passes with strict mode unchanged.
---
### T08 — Docs + Runbook Truth Sync
Priority: P2
Owner: agent-docs-sync
Dependencies: T01, T03, T05
Deliverables:
- Update stale API/import docs to match actual payloads and behavior (`docs/api.md`, relevant README sections).
- Document known import limits and expected failure messages.
- Add “how to add parser fixture” note for future maintainers.
Acceptance:
- [ ] Docs describe real import endpoint behavior (not aspirational).
- [ ] Example request/response includes `image_url` handling rules.
- [ ] No references to removed/incorrect field names in import examples.
---
## Wave Plan (Execution Order)
### Wave 1 — Restore Functional Import Core
- T01 (route orchestration)
### Wave 2 — Image Quality + Safety
- T02 (schema image extraction)
- T03 (validation/fallback policy)
- T05 (contract alignment)
### Wave 3 — Backend Hardening in Parallel
- T04 (migration coverage)
- T06 (search/index optimization)
### Wave 4 — Cleanup + Stability
- T07 (TS hygiene)
- T08 (docs truth sync)
## Dependency Notes
- T03 depends on T02 because sanitizer policy should apply to ranked image candidates.
- T05 depends on T01 because real response shape must exist before contract lock.
- T07 should start after T01/T05 to avoid churn from contract refactors.
## Recommended Starting Order (Concrete)
1. **T01** — unblock real import behavior and expose true integration issues.
2. T02 — improve image extraction quality where import currently underperforms.
3. T03 — enforce URL safety/normalization policy.
4. T05 — lock backend/frontend import contract and image fields.
5. Parallel: T04 + T06.
6. T07 then T08.
## First Task to Launch
**Launch T01 — Import Route Orchestration (Wire Real Pipeline).**
Reason: it converts a stubbed endpoint into executable behavior and creates the integration baseline needed by every downstream image-quality and hardening task.
## Reporting Protocol (for each task)
1) task id
2) files changed
3) tests added/updated + command output
4) blockers/risks
5) ready-for-review flag