# FTS5 Virtual Table Schema & Trigger Plan — Design Spec **Task:** T1: Design FTS5 virtual table schema + trigger plan for Recipe Manager **Created:** 2026-03-30 **Target:** Implementation blueprint for T2 (migration) **Schema baseline:** `src/backend/db/schema.sql` (2026-03-28 MVP normalized) --- ## 1. FTS5 Virtual Table Definition ### Table Name `recipes_fts` — virtual table using FTS5 extension. ### Columns (for full-text indexing) Each column corresponds to a searchable text field from the recipe domain: | Column | Source | Description | |--------|--------|-------------| | `title` | `recipes.title` | Recipe title (primary identifier) | | `description` | `recipes.description` | Recipe description/brief | | `ingredient_item` | `ingredients.item` aggregated | Ingredient names (e.g., "chicken", "flour") | | `ingredient_notes` | `ingredients.notes` aggregated | Ingredient-specific notes (e.g., " diced", "to taste") | | `step_instruction` | `steps.instruction` aggregated | Cooking step instructions | | `tag_name` | `tags.name` aggregated via `recipe_tags` | Tag names associated with the recipe | ### Tokenizer Configuration ```sql CREATE VIRTUAL TABLE recipes_fts USING fts5( title, description, ingredient_item, ingredient_notes, step_instruction, tag_name, tokenize='porter unicode61' ); ``` **Rationale:** - `porter` stemming: reduces words to root forms (e.g., "chopped" → "chop", "chicken" stays "chicken", "sautéed" → "saut"). Improves recall for cooking terminology. - `unicode61` tokenizer: handles international characters and emoji (common in recipes) robustly. - Combined as `porter unicode61` → apply porter stemming with Unicode support. ### Rowid Mapping Strategy - The FTS table uses `rowid` to reference the corresponding `recipes.id`. - We will store `recipes.id` as the `rowid` directly: `INSERT INTO recipes_fts(rowid, ...) VALUES(recipe_id, ...)`. - This enables efficient joins: `SELECT r.* FROM recipes r JOIN recipes_fts fts ON r.id = fts.rowid WHERE fts MATCH ?`. --- ## 2. Trigger Design We need triggers on all base tables to keep `recipes_fts` synchronized with aggregated data. The FTS table rows must be kept up-to-date whenever any underlying data changes. ### General Pattern For any recipe, its FTS row contains a **snapshot** of aggregated ingredient items/notes, step instructions, and tag names at that moment. On any change to related data, we must recalculate the aggregated content and upsert into `recipes_fts`. ### Helper SQL Functions We recommend creating SQL functions to encapsulate the aggregation logic: ```sql -- Aggregate ingredient text for a recipe CREATE OR REPLACE FUNCTION fts_aggregate_ingredients(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(item || COALESCE(' ' || notes, ''), ' ') FROM ingredients WHERE recipe_id = recipe_id; END; -- Aggregate step instructions for a recipe CREATE OR REPLACE FUNCTION fts_aggregate_steps(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(instruction, ' ') FROM steps WHERE recipe_id = recipe_id ORDER BY position; END; -- Aggregate tag names for a recipe CREATE OR REPLACE FUNCTION fts_aggregate_tags(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(t.name, ' ') FROM tags t JOIN recipe_tags rt ON t.id = rt.tag_id WHERE rt.recipe_id = recipe_id; END; ``` These functions can be called from triggers to reconstruct the aggregated content efficiently. ### Trigger Definitions #### 2.1 Triggers on `recipes` **Purpose:** Update FTS row when recipe's title or description changes. - `recipes_after_insert`: Insert initial FTS row for new recipe. - `recipes_after_update`: Update FTS row if title/description changed. - `recipes_after_delete`: Delete FTS row when recipe is removed. ```sql -- AFTER INSERT on recipes CREATE TRIGGER recipes_after_insert AFTER INSERT ON recipes BEGIN INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name) VALUES ( NEW.id, NEW.title, NEW.description, fts_aggregate_ingredients(NEW.id), fts_aggregate_ingredients(NEW.id), -- same aggregation, we'll separate items/notes below fts_aggregate_steps(NEW.id), fts_aggregate_tags(NEW.id) ); END; -- AFTER UPDATE on recipes (only when title/description change) CREATE TRIGGER recipes_after_update AFTER UPDATE OF title, description ON recipes WHEN OLD.title != NEW.title OR OLD.description != NEW.description BEGIN UPDATE recipes_fts SET title = NEW.title, description = NEW.description WHERE rowid = OLD.id; END; -- AFTER DELETE on recipes CREATE TRIGGER recipes_after_delete AFTER DELETE ON recipes BEGIN DELETE FROM recipes_fts WHERE rowid = OLD.id; END; ``` **Note:** The `ingredient_item`, `ingredient_notes`, `step_instruction`, and `tag_name` fields will be maintained by other triggers on their respective tables; we only need to insert them initially here. #### 2.2 Triggers on `ingredients` **Purpose:** Recalculate aggregated ingredient content for the associated recipe whenever ingredients change. - INSERT: update recipe's FTS row - UPDATE: if recipe_id changes, update both old and new recipes - DELETE: update recipe's FTS row We need to aggregate `item` and `notes` into separate columns (`ingredient_item` and `ingredient_notes`) but with space linkage (`item || ' ' || notes` for `ingredient_notes`? Actually spec says `ingredient_notes` column should contain notes only? Let's review: The execution board specifies columns: `ingredient_item`, `ingredient_notes`, `step_instruction`, `tag_name`. Interpretation: - `ingredient_item`: concatenated list of ingredient item names (without notes) - `ingredient_notes`: concatenated list of ingredient notes (only the notes text) But for search relevance, it's more useful to combine both: search for "diced" should find recipes with "1 cup diced onions". If we separate them, the note "diced" may not match the item column. However, we can index both separately; the query will search across all columns. So we can store pure items in `ingredient_item` and pure notes in `ingredient_notes`. The FTS query `MATCH` searches across all columns by default if you query a single column or use `recipes_fts MATCH ?` without column spec. But we might want to weight columns differently? That's an advanced optimization not required now. Given the spec, I'll implement: - `ingredient_item`: `GROUP_CONCAT(item, ' ')` - `ingredient_notes`: `GROUP_CONCAT(notes, ' ')` (excluding nulls) ```sql -- Helper to get aggregated ingredient item text CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_items(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(item, ' ') FROM ingredients WHERE recipe_id = recipe_id; END; -- Helper to get aggregated ingredient notes text CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_notes(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(notes, ' ') FROM ingredients WHERE recipe_id = recipe_id AND notes IS NOT NULL AND notes != ''; END; -- AFTER INSERT on ingredients CREATE TRIGGER ingredients_after_insert AFTER INSERT ON ingredients BEGIN UPDATE recipes_fts SET ingredient_item = fts_aggregate_ingredient_items(NEW.recipe_id), ingredient_notes = fts_aggregate_ingredient_notes(NEW.recipe_id) WHERE rowid = NEW.recipe_id; -- If recipe doesn't exist in FTS yet (e.g., recipe added earlier without trigger), insert it INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name) SELECT r.id, r.title, r.description, fts_aggregate_ingredient_items(r.id), fts_aggregate_ingredient_notes(r.id), fts_aggregate_steps(r.id), fts_aggregate_tags(r.id) FROM recipes r WHERE r.id = NEW.recipe_id AND NOT EXISTS (SELECT 1 FROM recipes_fts WHERE rowid = r.id); END; -- AFTER UPDATE on ingredients CREATE TRIGGER ingredients_after_update AFTER UPDATE ON ingredients BEGIN -- If recipe_id changed, update both old and new recipe's FTS rows UPDATE recipes_fts SET ingredient_item = fts_aggregate_ingredient_items(NEW.recipe_id), ingredient_notes = fts_aggregate_ingredient_notes(NEW.recipe_id) WHERE rowid = NEW.recipe_id; UPDATE recipes_fts SET ingredient_item = fts_aggregate_ingredient_items(OLD.recipe_id), ingredient_notes = fts_aggregate_ingredient_notes(OLD.recipe_id) WHERE rowid = OLD.recipe_id AND OLD.recipe_id != NEW.recipe_id; END; -- AFTER DELETE on ingredients CREATE TRIGGER ingredients_after_delete AFTER DELETE ON ingredients BEGIN UPDATE recipes_fts SET ingredient_item = fts_aggregate_ingredient_items(OLD.recipe_id), ingredient_notes = fts_aggregate_ingredient_notes(OLD.recipe_id) WHERE rowid = OLD.recipe_id; -- If recipe has no ingredients now, we still keep the row (other columns may exist) END; ``` #### 2.3 Triggers on `steps` ```sql -- AFTER INSERT on steps CREATE TRIGGER steps_after_insert AFTER INSERT ON steps BEGIN UPDATE recipes_fts SET step_instruction = fts_aggregate_steps(NEW.recipe_id) WHERE rowid = NEW.recipe_id; INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name) SELECT r.id, r.title, r.description, fts_aggregate_ingredient_items(r.id), fts_aggregate_ingredient_notes(r.id), fts_aggregate_steps(r.id), fts_aggregate_tags(r.id) FROM recipes r WHERE r.id = NEW.recipe_id AND NOT EXISTS (SELECT 1 FROM recipes_fts WHERE rowid = r.id); END; -- AFTER UPDATE on steps CREATE TRIGGER steps_after_update AFTER UPDATE ON steps BEGIN UPDATE recipes_fts SET step_instruction = fts_aggregate_steps(NEW.recipe_id) WHERE rowid = NEW.recipe_id; UPDATE recipes_fts SET step_instruction = fts_aggregate_steps(OLD.recipe_id) WHERE rowid = OLD.recipe_id AND OLD.recipe_id != NEW.recipe_id; END; -- AFTER DELETE on steps CREATE TRIGGER steps_after_delete AFTER DELETE ON steps BEGIN UPDATE recipes_fts SET step_instruction = fts_aggregate_steps(OLD.recipe_id) WHERE rowid = OLD.recipe_id; END; ``` #### 2.4 Triggers on `tags` and `recipe_tags` Because tags are many-to-many, changes to tags can affect multiple recipes: - INSERT tag: no recipes yet, nothing to do. - UPDATE tag name: affects all recipes linked to that tag. - DELETE tag: affects all recipes that had that tag (via ON DELETE CASCADE on recipe_tags, but before the tag row is removed, we need to update FTS for those recipes). For `recipe_tags`: - INSERT: affects the recipe (and possibly the tag, but tag name is used). The recipe gains a new tag → need to update its `tag_name` column. - DELETE: recipe loses a tag → need to update its `tag_name` column. - UPDATE on `recipe_tags` (rare): combination of delete + insert. ```sql -- AFTER INSERT on recipe_tags CREATE TRIGGER recipe_tags_after_insert AFTER INSERT ON recipe_tags BEGIN UPDATE recipes_fts SET tag_name = fts_aggregate_tags(NEW.recipe_id) WHERE rowid = NEW.recipe_id; INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name) SELECT r.id, r.title, r.description, fts_aggregate_ingredient_items(r.id), fts_aggregate_ingredient_notes(r.id), fts_aggregate_steps(r.id), fts_aggregate_tags(r.id) FROM recipes r WHERE r.id = NEW.recipe_id AND NOT EXISTS (SELECT 1 FROM recipes_fts WHERE rowid = r.id); END; -- AFTER DELETE on recipe_tags CREATE TRIGGER recipe_tags_after_delete AFTER DELETE ON recipe_tags BEGIN UPDATE recipes_fts SET tag_name = fts_aggregate_tags(OLD.recipe_id) WHERE rowid = OLD.recipe_id; END; -- AFTER UPDATE on tags (name changed) — need to update all recipes with that tag CREATE TRIGGER tags_after_update AFTER UPDATE OF name ON tags BEGIN UPDATE recipes_fts SET tag_name = fts_aggregate_tags(rt.recipe_id) FROM recipe_tags rt WHERE recipes_fts.rowid = rt.recipe_id AND rt.tag_id = NEW.id; END; -- BEFORE DELETE on tags — update all recipes that reference this tag before tag is removed -- (we could also rely on ON DELETE CASCADE from recipe_tags, but we need to update FTS before the tag row is gone) CREATE TRIGGER tags_before_delete BEFORE DELETE ON tags BEGIN UPDATE recipes_fts SET tag_name = fts_aggregate_tags(rt.recipe_id) FROM recipe_tags rt WHERE recipes_fts.rowid = rt.recipe_id AND rt.tag_id = OLD.id; END; ``` **Note:** SQLite's `UPDATE ... FROM` is available in modern versions (3.33.0+). If not supported in the project's SQLite version, we can rewrite using subqueries. The migration implementation (T2) should check SQLite version and adapt accordingly. --- ## 3. Backfill Strategy Existing recipes must be loaded into `recipes_fts` before the application uses it. ### Migration Steps 1. **Ensure FTS5 extension is available** (SQLite 3.9.0+). Enable extension loading if needed: `PRAGMA foreign_keys = ON;` (no special action required for FTS5; it's built-in). 2. **Create helper functions** (`fts_aggregate_*`) before creating triggers, because triggers call them. 3. **Create the virtual table:** ```sql CREATE VIRTUAL TABLE IF NOT EXISTS recipes_fts USING fts5( title, description, ingredient_item, ingredient_notes, step_instruction, tag_name, tokenize='porter unicode61' ); ``` 4. **Backfill all existing recipes** in a single transaction (for performance and consistency): ```sql INSERT INTO recipes_fts(rowid, title, description, ingredient_item, ingredient_notes, step_instruction, tag_name) SELECT r.id, r.title, r.description, GROUP_CONCAT(i.item, ' ') as ingredient_items, GROUP_CONCAT(i.notes, ' ') as ingredient_notes, (SELECT GROUP_CONCAT(s.instruction, ' ') FROM steps s WHERE s.recipe_id = r.id) as step_instruction, (SELECT GROUP_CONCAT(t.name, ' ') FROM tags t JOIN recipe_tags rt ON t.id = rt.tag_id WHERE rt.recipe_id = r.id) as tag_name FROM recipes r LEFT JOIN ingredients i ON r.id = i.recipe_id GROUP BY r.id; ``` **Why this works:** Uses `GROUP BY r.id` to aggregate ingredients per recipe. Subqueries aggregate steps and tags. Handles recipes with no ingredients/steps/tags gracefully (NULL → FTS stores empty text? Need to check SQLite FTS behavior: NULL becomes empty string, fine). 5. **Create triggers** as defined in Section 2, in order: helper functions first, then FTS table, then backfill, then triggers. But triggers reference the functions, so functions must exist before triggers are created. 6. **Test backfill completeness:** ```sql SELECT COUNT(*) FROM recipes; -- should equal COUNT(*) FROM recipes_fts ``` ### Idempotency & Rollback - Migration script should be idempotent: use `CREATE TABLE IF NOT EXISTS`, `CREATE TRIGGER IF NOT EXISTS`, and check if backfill is needed (e.g., by checking `recipes_fts` row count before inserting). - For rollback: drop all triggers and the FTS table. Keep data in main tables intact. --- ## 4. Search Query Usage Pattern (for backend developer) Once FTS5 is set up, the search endpoint should: ```sql -- Basic query with BM25 ranking SELECT r.*, bm25(recipes_fts) as rank_score FROM recipes r JOIN recipes_fts fts ON r.id = fts.rowid WHERE fts MATCH ? ORDER BY rank_score ASC -- lower bm25 is better LIMIT ? OFFSET ?; ``` To get total count for pagination: ```sql SELECT COUNT(*) FROM recipes_fts WHERE recipes_fts MATCH ?; ``` **Phrase search:** Use quotes in query: `"chocolate chip"` → users can type it, pass directly to MATCH. **Prefix search:** Use `*` at end: `choc*` → user may type, treat as prefix. **Security:** Parameterize the query string to avoid SQL injection. Use prepared statements. --- ## 5. Implementation Notes & Edge Cases ### Column Aggregation Decisions - **`ingredient_item` vs `ingredient_notes`**: We separate them to allow independent boosting if needed later. Currently both are concatenated with spaces; nulls are excluded. - **Ordering preservation** (`GROUP_CONCAT`): By default SQLite does not guarantee order; we can add `ORDER BY position` within aggregation if needed: `GROUP_CONCAT(item, ' ')` with `ORDER BY position` modifier. For FTS, order matters for phrase queries but not for individual term matches. Still, it's good to preserve natural order. Modify helpers: ```sql SELECT GROUP_CONCAT(item, ' ') FROM (SELECT item FROM ingredients WHERE recipe_id = ? ORDER BY position); ``` - Same for steps: should honor `position`. Update functions accordingly: ```sql CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_items(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(item, ' ') FROM (SELECT item FROM ingredients WHERE recipe_id = recipe_id ORDER BY position); END; CREATE OR REPLACE FUNCTION fts_aggregate_ingredient_notes(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(notes, ' ') FROM (SELECT notes FROM ingredients WHERE recipe_id = recipe_id ORDER BY position) WHERE notes IS NOT NULL AND notes != ''; END; CREATE OR REPLACE FUNCTION fts_aggregate_steps(recipe_id INTEGER) RETURNS TEXT DETERMINISTIC BEGIN SELECT GROUP_CONCAT(instruction, ' ') FROM (SELECT instruction FROM steps WHERE recipe_id = recipe_id ORDER BY position); END; ``` ### Multi-Recipes Updated by Single Trigger Triggers on `tags` and `recipe_tags` can affect many recipes (e.g., deleting a tag popular in many recipes). The `UPDATE recipes_fts ... FROM recipe_tags` pattern updates all affected rows in one statement, which is efficient. However, if a tag has thousands of recipes, this could be heavy; but tags are typically not that widely used. Acceptable. ### Concurrency Triggers run within the same transaction as the triggering statement. So FTS remains consistent with the main tables. No additional locking needed beyond SQLite's transaction isolation. ### FTS5 Content Option We are using the default `content=''` (external content) because the FTS table does not store its own copy of the full text; it references `rowid` only. We manage content manually via triggers. That's what we designed above. Alternatively, could use `content='recipes'` and let SQLite auto-maintain content via `content_triggers`, but that would only index columns directly from `recipes` table, not aggregated from ingredients/steps/tags. So external content + explicit triggers is necessary. ### Performance Considerations - Triggers add overhead on every write to ingredients/steps/recipe_tags. But writes are relatively infrequent compared to reads; search is read-heavy. - Backfill may be slow for large datasets (>10k recipes). Can batch if needed, but likely okay for a personal recipe manager. - FTS5 indexes maintain inverted index; queries should be fast. ### Null Handling - `GROUP_CONCAT` returns NULL if no rows; FTS5 stores empty string for NULL. We can coalesce: `COALESCE(GROUP_CONCAT(...), '')` in functions to be explicit. --- ## 6. Deliverables for T2 (Implementation Checklist) - [ ] Create SQL migration file: `migrations/YYYY-MM-DD-add-fts5.sql` - [ ] Add helper aggregation functions (deterministic, with ordering) - [ ] Create `recipes_fts` virtual table with `porter unicode61` tokenizer - [ ] Implement backfill query (single INSERT...SELECT) - [ ] Create all triggers (recipes, ingredients, steps, tags, recipe_tags) with proper WHEN clauses to avoid unnecessary updates - [ ] Make migration idempotent (use `IF NOT EXISTS` checks and/or guard inserts) - [ ] Add tests for trigger behavior (unit tests that simulate data changes and verify FTS content) - [ ] Document rollback: `DROP TRIGGER IF EXISTS ...` and `DROP TABLE IF EXISTS recipes_fts` - [ ] Verify row counts match after backfill --- ## 7. Schema Version Bump Current schema version: `2026-03-28` (from migrations). This FTS5 addition constitutes a schema change. Recommendation: increment to `2026-03-30-fts5` or similar. Update any `schemaVersion` constant in application code. --- ## 8. Summary Diagram (Conceptual) ``` recipes (id) ↓ (triggers on recipes) ingredients (recipe_id) → aggregate via fts_aggregate_ingredient_*() steps (recipe_id) → aggregate via fts_aggregate_steps() tags ← recipe_tags (recipe_id, tag_id) → aggregate via fts_aggregate_tags() recipes_fts(rowid = recipes.id) stores searchable text from all above. Search query: SELECT recipes.* FROM recipes JOIN recipes_fts USING(rowid) WHERE MATCH. ``` All modifications to base tables funnel through triggers that recompute only the affected recipe's aggregated fields and update its corresponding FTS row. --- **Design completed for T2 implementation.** **Key equations:** - FTS row content: `{title, description, Σ(ingredient items), Σ(ingredient notes), Σ(step instructions), Σ(tag names)}` **Triggers:** - `recipes_after_insert/update/delete` → maintain recipe FTS row (title/desc) - `ingredients_after_insert/update/delete` → recalc ingredient aggregates - `steps_after_insert/update/delete` → recalc step aggregates - `recipe_tags_after_insert/delete` → recalc tag aggregates for recipe - `tags_after_update` (name change) → recalc tag aggregates for all recipes using that tag - `tags_before_delete` → recalc tag aggregates for affected recipes **Backfill:** One-shot `INSERT INTO recipes_fts SELECT ... GROUP BY recipes.id` with left joins/subqueries.