recipe-manager/.harness/fts5-task1-design.md

564 lines
21 KiB
Markdown

# FTS5 Virtual Table Schema & Trigger Plan — Design Spec
**Task:** T1: Design FTS5 virtual table schema + trigger plan for Recipe Manager
**Created:** 2026-03-30
**Target:** Implementation blueprint for T2 (migration)
**Schema baseline:** `src/backend/db/schema.sql` (2026-03-28 MVP normalized)
---
## 1. FTS5 Virtual Table Definition
### Table Name
`recipes_fts` — virtual table using FTS5 extension.
### Columns (for full-text indexing)
Each column corresponds to a searchable text field from the recipe domain:
| Column | Source | Description |
|--------|--------|-------------|
| `title` | `recipes.title` | Recipe title (primary identifier) |
| `description` | `recipes.description` | Recipe description/brief |
| `ingredient_item` | `ingredients.item` aggregated | Ingredient names (e.g., "chicken", "flour") |
| `ingredient_notes` | `ingredients.notes` aggregated | Ingredient-specific notes (e.g., " diced", "to taste") |
| `step_instruction` | `steps.instruction` aggregated | Cooking step instructions |
| `tag_name` | `tags.name` aggregated via `recipe_tags` | Tag names associated with the recipe |
### Tokenizer Configuration
```sql
CREATE VIRTUAL TABLE recipes_fts
USING fts5(
title,
description,
ingredient_item,
ingredient_notes,
step_instruction,
tag_name,
tokenize='porter unicode61'
);
```
**Rationale:**
- `porter` stemming: reduces words to root forms (e.g., "chopped" → "chop", "chicken" stays "chicken", "sautéed" → "saut"). Improves recall for cooking terminology.
- `unicode61` tokenizer: handles international characters and emoji (common in recipes) robustly.
- Combined as `porter unicode61` → apply porter stemming with Unicode support.
### Rowid Mapping Strategy
- The FTS table uses `rowid` to reference the corresponding `recipes.id`.
- We will store `recipes.id` as the `rowid` directly: `INSERT INTO recipes_fts(rowid, ...) VALUES(recipe_id, ...)`.
- This enables efficient joins: `SELECT r.* FROM recipes r JOIN recipes_fts fts ON r.id = fts.rowid WHERE fts MATCH ?`.
---
## 2. Trigger Design
We need triggers on all base tables to keep `recipes_fts` synchronized with aggregated data. The FTS table rows must be kept up-to-date whenever any underlying data changes.
### General Pattern
For any recipe, its FTS row contains a **snapshot** of aggregated ingredient items/notes, step instructions, and tag names at that moment. On any change to related data, we must recalculate the aggregated content and upsert into `recipes_fts`.
### Helper SQL Functions
We recommend creating SQL functions to encapsulate the aggregation logic:
```sql
-- Aggregate ingredient text for a recipe
CREATE OR REPLACE FUNCTION fts_aggregate_ingredients(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(item || COALESCE(' ' || notes, ''), ' ')
FROM ingredients
WHERE recipe_id = recipe_id;
END;
-- Aggregate step instructions for a recipe
CREATE OR REPLACE FUNCTION fts_aggregate_steps(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(instruction, ' ')
FROM steps
WHERE recipe_id = recipe_id
ORDER BY position;
END;
-- Aggregate tag names for a recipe
CREATE OR REPLACE FUNCTION fts_aggregate_tags(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(t.name, ' ')
FROM tags t
JOIN recipe_tags rt ON t.id = rt.tag_id
WHERE rt.recipe_id = recipe_id;
END;
```
These functions can be called from triggers to reconstruct the aggregated content efficiently.
### Trigger Definitions
#### 2.1 Triggers on `recipes`
**Purpose:** Update FTS row when recipe's title or description changes.
- `recipes_after_insert`: Insert initial FTS row for new recipe.
- `recipes_after_update`: Update FTS row if title/description changed.
- `recipes_after_delete`: Delete FTS row when recipe is removed.
```sql
-- AFTER INSERT on recipes
CREATE TRIGGER recipes_after_insert
AFTER INSERT ON recipes
BEGIN
INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name)
VALUES (
NEW.id,
NEW.title,
NEW.description,
fts_aggregate_ingredients(NEW.id),
fts_aggregate_ingredients(NEW.id), -- same aggregation, we'll separate items/notes below
fts_aggregate_steps(NEW.id),
fts_aggregate_tags(NEW.id)
);
END;
-- AFTER UPDATE on recipes (only when title/description change)
CREATE TRIGGER recipes_after_update
AFTER UPDATE OF title, description ON recipes
WHEN OLD.title != NEW.title OR OLD.description != NEW.description
BEGIN
UPDATE recipes_fts
SET
title = NEW.title,
description = NEW.description
WHERE rowid = OLD.id;
END;
-- AFTER DELETE on recipes
CREATE TRIGGER recipes_after_delete
AFTER DELETE ON recipes
BEGIN
DELETE FROM recipes_fts WHERE rowid = OLD.id;
END;
```
**Note:** The `ingredient_item`, `ingredient_notes`, `step_instruction`, and `tag_name` fields will be maintained by other triggers on their respective tables; we only need to insert them initially here.
#### 2.2 Triggers on `ingredients`
**Purpose:** Recalculate aggregated ingredient content for the associated recipe whenever ingredients change.
- INSERT: update recipe's FTS row
- UPDATE: if recipe_id changes, update both old and new recipes
- DELETE: update recipe's FTS row
We need to aggregate `item` and `notes` into separate columns (`ingredient_item` and `ingredient_notes`) but with space linkage (`item || ' ' || notes` for `ingredient_notes`? Actually spec says `ingredient_notes` column should contain notes only? Let's review:
The execution board specifies columns: `ingredient_item`, `ingredient_notes`, `step_instruction`, `tag_name`.
Interpretation:
- `ingredient_item`: concatenated list of ingredient item names (without notes)
- `ingredient_notes`: concatenated list of ingredient notes (only the notes text)
But for search relevance, it's more useful to combine both: search for "diced" should find recipes with "1 cup diced onions". If we separate them, the note "diced" may not match the item column. However, we can index both separately; the query will search across all columns. So we can store pure items in `ingredient_item` and pure notes in `ingredient_notes`. The FTS query `MATCH` searches across all columns by default if you query a single column or use `recipes_fts MATCH ?` without column spec. But we might want to weight columns differently? That's an advanced optimization not required now.
Given the spec, I'll implement:
- `ingredient_item`: `GROUP_CONCAT(item, ' ')`
- `ingredient_notes`: `GROUP_CONCAT(notes, ' ')` (excluding nulls)
```sql
-- Helper to get aggregated ingredient item text
CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_items(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(item, ' ')
FROM ingredients
WHERE recipe_id = recipe_id;
END;
-- Helper to get aggregated ingredient notes text
CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_notes(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(notes, ' ')
FROM ingredients
WHERE recipe_id = recipe_id AND notes IS NOT NULL AND notes != '';
END;
-- AFTER INSERT on ingredients
CREATE TRIGGER ingredients_after_insert
AFTER INSERT ON ingredients
BEGIN
UPDATE recipes_fts
SET
ingredient_item = fts_aggregate_ingredient_items(NEW.recipe_id),
ingredient_notes = fts_aggregate_ingredient_notes(NEW.recipe_id)
WHERE rowid = NEW.recipe_id;
-- If recipe doesn't exist in FTS yet (e.g., recipe added earlier without trigger), insert it
INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name)
SELECT
r.id,
r.title,
r.description,
fts_aggregate_ingredient_items(r.id),
fts_aggregate_ingredient_notes(r.id),
fts_aggregate_steps(r.id),
fts_aggregate_tags(r.id)
FROM recipes r
WHERE r.id = NEW.recipe_id AND NOT EXISTS (SELECT 1 FROM recipes_fts WHERE rowid = r.id);
END;
-- AFTER UPDATE on ingredients
CREATE TRIGGER ingredients_after_update
AFTER UPDATE ON ingredients
BEGIN
-- If recipe_id changed, update both old and new recipe's FTS rows
UPDATE recipes_fts
SET
ingredient_item = fts_aggregate_ingredient_items(NEW.recipe_id),
ingredient_notes = fts_aggregate_ingredient_notes(NEW.recipe_id)
WHERE rowid = NEW.recipe_id;
UPDATE recipes_fts
SET
ingredient_item = fts_aggregate_ingredient_items(OLD.recipe_id),
ingredient_notes = fts_aggregate_ingredient_notes(OLD.recipe_id)
WHERE rowid = OLD.recipe_id AND OLD.recipe_id != NEW.recipe_id;
END;
-- AFTER DELETE on ingredients
CREATE TRIGGER ingredients_after_delete
AFTER DELETE ON ingredients
BEGIN
UPDATE recipes_fts
SET
ingredient_item = fts_aggregate_ingredient_items(OLD.recipe_id),
ingredient_notes = fts_aggregate_ingredient_notes(OLD.recipe_id)
WHERE rowid = OLD.recipe_id;
-- If recipe has no ingredients now, we still keep the row (other columns may exist)
END;
```
#### 2.3 Triggers on `steps`
```sql
-- AFTER INSERT on steps
CREATE TRIGGER steps_after_insert
AFTER INSERT ON steps
BEGIN
UPDATE recipes_fts
SET step_instruction = fts_aggregate_steps(NEW.recipe_id)
WHERE rowid = NEW.recipe_id;
INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name)
SELECT
r.id,
r.title,
r.description,
fts_aggregate_ingredient_items(r.id),
fts_aggregate_ingredient_notes(r.id),
fts_aggregate_steps(r.id),
fts_aggregate_tags(r.id)
FROM recipes r
WHERE r.id = NEW.recipe_id AND NOT EXISTS (SELECT 1 FROM recipes_fts WHERE rowid = r.id);
END;
-- AFTER UPDATE on steps
CREATE TRIGGER steps_after_update
AFTER UPDATE ON steps
BEGIN
UPDATE recipes_fts
SET step_instruction = fts_aggregate_steps(NEW.recipe_id)
WHERE rowid = NEW.recipe_id;
UPDATE recipes_fts
SET step_instruction = fts_aggregate_steps(OLD.recipe_id)
WHERE rowid = OLD.recipe_id AND OLD.recipe_id != NEW.recipe_id;
END;
-- AFTER DELETE on steps
CREATE TRIGGER steps_after_delete
AFTER DELETE ON steps
BEGIN
UPDATE recipes_fts
SET step_instruction = fts_aggregate_steps(OLD.recipe_id)
WHERE rowid = OLD.recipe_id;
END;
```
#### 2.4 Triggers on `tags` and `recipe_tags`
Because tags are many-to-many, changes to tags can affect multiple recipes:
- INSERT tag: no recipes yet, nothing to do.
- UPDATE tag name: affects all recipes linked to that tag.
- DELETE tag: affects all recipes that had that tag (via ON DELETE CASCADE on recipe_tags, but before the tag row is removed, we need to update FTS for those recipes).
For `recipe_tags`:
- INSERT: affects the recipe (and possibly the tag, but tag name is used). The recipe gains a new tag → need to update its `tag_name` column.
- DELETE: recipe loses a tag → need to update its `tag_name` column.
- UPDATE on `recipe_tags` (rare): combination of delete + insert.
```sql
-- AFTER INSERT on recipe_tags
CREATE TRIGGER recipe_tags_after_insert
AFTER INSERT ON recipe_tags
BEGIN
UPDATE recipes_fts
SET tag_name = fts_aggregate_tags(NEW.recipe_id)
WHERE rowid = NEW.recipe_id;
INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name)
SELECT
r.id,
r.title,
r.description,
fts_aggregate_ingredient_items(r.id),
fts_aggregate_ingredient_notes(r.id),
fts_aggregate_steps(r.id),
fts_aggregate_tags(r.id)
FROM recipes r
WHERE r.id = NEW.recipe_id AND NOT EXISTS (SELECT 1 FROM recipes_fts WHERE rowid = r.id);
END;
-- AFTER DELETE on recipe_tags
CREATE TRIGGER recipe_tags_after_delete
AFTER DELETE ON recipe_tags
BEGIN
UPDATE recipes_fts
SET tag_name = fts_aggregate_tags(OLD.recipe_id)
WHERE rowid = OLD.recipe_id;
END;
-- AFTER UPDATE on tags (name changed) — need to update all recipes with that tag
CREATE TRIGGER tags_after_update
AFTER UPDATE OF name ON tags
BEGIN
UPDATE recipes_fts
SET tag_name = fts_aggregate_tags(rt.recipe_id)
FROM recipe_tags rt
WHERE recipes_fts.rowid = rt.recipe_id
AND rt.tag_id = NEW.id;
END;
-- BEFORE DELETE on tags — update all recipes that reference this tag before tag is removed
-- (we could also rely on ON DELETE CASCADE from recipe_tags, but we need to update FTS before the tag row is gone)
CREATE TRIGGER tags_before_delete
BEFORE DELETE ON tags
BEGIN
UPDATE recipes_fts
SET tag_name = fts_aggregate_tags(rt.recipe_id)
FROM recipe_tags rt
WHERE recipes_fts.rowid = rt.recipe_id
AND rt.tag_id = OLD.id;
END;
```
**Note:** SQLite's `UPDATE ... FROM` is available in modern versions (3.33.0+). If not supported in the project's SQLite version, we can rewrite using subqueries. The migration implementation (T2) should check SQLite version and adapt accordingly.
---
## 3. Backfill Strategy
Existing recipes must be loaded into `recipes_fts` before the application uses it.
### Migration Steps
1. **Ensure FTS5 extension is available** (SQLite 3.9.0+). Enable extension loading if needed: `PRAGMA foreign_keys = ON;` (no special action required for FTS5; it's built-in).
2. **Create helper functions** (`fts_aggregate_*`) before creating triggers, because triggers call them.
3. **Create the virtual table:**
```sql
CREATE VIRTUAL TABLE IF NOT EXISTS recipes_fts
USING fts5(
title,
description,
ingredient_item,
ingredient_notes,
step_instruction,
tag_name,
tokenize='porter unicode61'
);
```
4. **Backfill all existing recipes** in a single transaction (for performance and consistency):
```sql
INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name)
SELECT
r.id,
r.title,
r.description,
GROUP_CONCAT(i.item, ' ') as ingredient_items,
GROUP_CONCAT(i.notes, ' ') as ingredient_notes,
(SELECT GROUP_CONCAT(s.instruction, ' ') FROM steps s WHERE s.recipe_id = r.id) as step_instruction,
(SELECT GROUP_CONCAT(t.name, ' ') FROM tags t
JOIN recipe_tags rt ON t.id = rt.tag_id
WHERE rt.recipe_id = r.id) as tag_name
FROM recipes r
LEFT JOIN ingredients i ON r.id = i.recipe_id
GROUP BY r.id;
```
**Why this works:** Uses `GROUP BY r.id` to aggregate ingredients per recipe. Subqueries aggregate steps and tags. Handles recipes with no ingredients/steps/tags gracefully (NULL → FTS stores empty text? Need to check SQLite FTS behavior: NULL becomes empty string, fine).
5. **Create triggers** as defined in Section 2, in order: helper functions first, then FTS table, then backfill, then triggers. But triggers reference the functions, so functions must exist before triggers are created.
6. **Test backfill completeness:**
```sql
SELECT COUNT(*) FROM recipes; -- should equal COUNT(*) FROM recipes_fts
```
### Idempotency & Rollback
- Migration script should be idempotent: use `CREATE TABLE IF NOT EXISTS`, `CREATE TRIGGER IF NOT EXISTS`, and check if backfill is needed (e.g., by checking `recipes_fts` row count before inserting).
- For rollback: drop all triggers and the FTS table. Keep data in main tables intact.
---
## 4. Search Query Usage Pattern (for backend developer)
Once FTS5 is set up, the search endpoint should:
```sql
-- Basic query with BM25 ranking
SELECT
r.*,
bm25(recipes_fts) as rank_score
FROM recipes r
JOIN recipes_fts fts ON r.id = fts.rowid
WHERE fts MATCH ?
ORDER BY rank_score ASC -- lower bm25 is better
LIMIT ? OFFSET ?;
```
To get total count for pagination:
```sql
SELECT COUNT(*) FROM recipes_fts WHERE recipes_fts MATCH ?;
```
**Phrase search:** Use quotes in query: `"chocolate chip"` → users can type it, pass directly to MATCH.
**Prefix search:** Use `*` at end: `choc*` → user may type, treat as prefix.
**Security:** Parameterize the query string to avoid SQL injection. Use prepared statements.
---
## 5. Implementation Notes & Edge Cases
### Column Aggregation Decisions
- **`ingredient_item` vs `ingredient_notes`**: We separate them to allow independent boosting if needed later. Currently both are concatenated with spaces; nulls are excluded.
- **Ordering preservation** (`GROUP_CONCAT`): By default SQLite does not guarantee order; we can add `ORDER BY position` within aggregation if needed: `GROUP_CONCAT(item, ' ')` with `ORDER BY position` modifier. For FTS, order matters for phrase queries but not for individual term matches. Still, it's good to preserve natural order. Modify helpers:
```sql
SELECT GROUP_CONCAT(item, ' ') FROM (SELECT item FROM ingredients WHERE recipe_id = ? ORDER BY position);
```
- Same for steps: should honor `position`.
Update functions accordingly:
```sql
CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_items(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(item, ' ')
FROM (SELECT item FROM ingredients WHERE recipe_id = recipe_id ORDER BY position);
END;
CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_notes(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(notes, ' ')
FROM (SELECT notes FROM ingredients WHERE recipe_id = recipe_id ORDER BY position)
WHERE notes IS NOT NULL AND notes != '';
END;
CREATE OR REPLACE FUNCTION fts_aggregate_steps(recipe_id INTEGER)
RETURNS TEXT DETERMINISTIC
BEGIN
SELECT GROUP_CONCAT(instruction, ' ')
FROM (SELECT instruction FROM steps WHERE recipe_id = recipe_id ORDER BY position);
END;
```
### Multi-Recipes Updated by Single Trigger
Triggers on `tags` and `recipe_tags` can affect many recipes (e.g., deleting a tag popular in many recipes). The `UPDATE recipes_fts ... FROM recipe_tags` pattern updates all affected rows in one statement, which is efficient. However, if a tag has thousands of recipes, this could be heavy; but tags are typically not that widely used. Acceptable.
### Concurrency
Triggers run within the same transaction as the triggering statement. So FTS remains consistent with the main tables. No additional locking needed beyond SQLite's transaction isolation.
### FTS5 Content Option
We are using the default `content=''` (external content) because the FTS table does not store its own copy of the full text; it references `rowid` only. We manage content manually via triggers. That's what we designed above.
Alternatively, could use `content='recipes'` and let SQLite auto-maintain content via `content_triggers`, but that would only index columns directly from `recipes` table, not aggregated from ingredients/steps/tags. So external content + explicit triggers is necessary.
### Performance Considerations
- Triggers add overhead on every write to ingredients/steps/recipe_tags. But writes are relatively infrequent compared to reads; search is read-heavy.
- Backfill may be slow for large datasets (>10k recipes). Can batch if needed, but likely okay for a personal recipe manager.
- FTS5 indexes maintain inverted index; queries should be fast.
### Null Handling
- `GROUP_CONCAT` returns NULL if no rows; FTS5 stores empty string for NULL. We can coalesce: `COALESCE(GROUP_CONCAT(...), '')` in functions to be explicit.
---
## 6. Deliverables for T2 (Implementation Checklist)
- [ ] Create SQL migration file: `migrations/YYYY-MM-DD-add-fts5.sql`
- [ ] Add helper aggregation functions (deterministic, with ordering)
- [ ] Create `recipes_fts` virtual table with `porter unicode61` tokenizer
- [ ] Implement backfill query (single INSERT...SELECT)
- [ ] Create all triggers (recipes, ingredients, steps, tags, recipe_tags) with proper WHEN clauses to avoid unnecessary updates
- [ ] Make migration idempotent (use `IF NOT EXISTS` checks and/or guard inserts)
- [ ] Add tests for trigger behavior (unit tests that simulate data changes and verify FTS content)
- [ ] Document rollback: `DROP TRIGGER IF EXISTS ...` and `DROP TABLE IF EXISTS recipes_fts`
- [ ] Verify row counts match after backfill
---
## 7. Schema Version Bump
Current schema version: `2026-03-28` (from migrations). This FTS5 addition constitutes a schema change. Recommendation: increment to `2026-03-30-fts5` or similar. Update any `schemaVersion` constant in application code.
---
## 8. Summary Diagram (Conceptual)
```
recipes (id)
↓ (triggers on recipes)
ingredients (recipe_id) → aggregate via fts_aggregate_ingredient_*()
steps (recipe_id) → aggregate via fts_aggregate_steps()
tags ← recipe_tags (recipe_id, tag_id) → aggregate via fts_aggregate_tags()
recipes_fts(rowid = recipes.id) stores searchable text from all above.
Search query: SELECT recipes.* FROM recipes JOIN recipes_fts USING(rowid) WHERE MATCH.
```
All modifications to base tables funnel through triggers that recompute only the affected recipe's aggregated fields and update its corresponding FTS row.
---
**Design completed for T2 implementation.**
**Key equations:**
- FTS row content: `{title, description, Σ(ingredient items), Σ(ingredient notes), Σ(step instructions), Σ(tag names)}`
**Triggers:**
- `recipes_after_insert/update/delete` → maintain recipe FTS row (title/desc)
- `ingredients_after_insert/update/delete` → recalc ingredient aggregates
- `steps_after_insert/update/delete` → recalc step aggregates
- `recipe_tags_after_insert/delete` → recalc tag aggregates for recipe
- `tags_after_update` (name change) → recalc tag aggregates for all recipes using that tag
- `tags_before_delete` → recalc tag aggregates for affected recipes
**Backfill:** One-shot `INSERT INTO recipes_fts SELECT ... GROUP BY recipes.id` with left joins/subqueries.