# Phase 1: CopyMeThat Format Analysis **Completed:** 2026-03-28 23:52 EDT **Analyzed by:** Cleo --- ## File Structure ### TXT Export - **Location:** `data/exports/Copy_Me_That_TXT_20260328_58775_z1p5lpjsgz/` - **Format:** One `.txt` file per recipe - **Naming:** Snake-case recipe names (e.g., `apple_crowned_coffee_cake.txt`) - **Estimated count:** ~50-100+ recipes ### HTML Export - **Location:** `data/exports/Copy_Me_That_HTML_20260328_58775_z1p5lpjsgz/` - **Format:** Single `recipes.html` file containing ALL recipes - **Images:** `/images/` subfolder with recipe photos - **Structure:** Semantic HTML with consistent IDs --- ## TXT Format Specification ### Structure ``` [Recipe Title] Adapted from [URL] tags: [Tag1], [Tag2], [Tag3] [Optional: "I made this."] Servings: [serving info] INGREDIENTS [ingredient 1] [ingredient 2] ... STEPS 1) [step 1] 2) [step 2] ... NOTES [optional notes] ``` ### Example ``` Apple-Crowned Coffee Cake Adapted from http://www.kraftcanada.com/recipes/apple-crowned-coffee-cake-191423 tags: Cake, Dessert I made this. Servings: 16 servings, 1 piece (76 g) each INGREDIENTS 2 cups flour 2 Tbsp. granulated sugar ... STEPS 1) Heat oven to 375°F. 2) Combine flour... NOTES If the glaze is too thick... ``` ### Key Fields - **Title:** First line - **Source URL:** "Adapted from [URL]" - **Tags:** Comma-separated after "tags:" - **Made flag:** Presence of "I made this." - **Servings:** After "Servings:" - **Ingredients:** Plain list between "INGREDIENTS" and "STEPS" - **Instructions:** Numbered list after "STEPS" - **Notes:** Optional, after "NOTES" --- ## HTML Format Specification ### Structure Single HTML file with repeated `.recipe` div blocks: ```html
Recipe Title
Tag1 Tag2
Description text
I made this. Rated 3/5
Servings: 8 servings...
  1. step 1
  2. ...
note text
``` ### Key Selectors - `.recipe` — Recipe container - `#name` — Title - `#original_link` — Source URL - `.recipeImage` — Image path - `.recipeCategory` — Tags - `#description` — Description - `#made_this` — Made flag - `#ratingValue` — Rating (1-5) - `#recipeYield` — Servings - `.recipeIngredient` — Ingredients (list items) - `.instruction` — Steps (ordered list items) - `.recipeNote` — Notes --- ## Implementation Strategy ### Recommended Approach: HTML Parser Primary **Rationale:** - HTML has MORE data (images, ratings, descriptions) - Single file = easier batch import - Well-structured semantic markup - Images already linked **Fallback:** TXT parser for edge cases ### Parser Architecture ``` ImportService ├── CopyMeThatHtmlParser │ ├── parseRecipes(html: string): Recipe[] │ ├── extractRecipeBlocks(html: string): HTMLElement[] │ └── parseRecipeBlock(block: HTMLElement): Recipe └── CopyMeThatTxtParser (optional fallback) └── parseTxtFile(content: string): Recipe ``` ### API Endpoint Design ``` POST /api/recipes/import/copyme that Content-Type: multipart/form-data Request: - file: recipes.html OR multiple .txt files - options: { skipDuplicates: boolean, importImages: boolean } Response: { success: true, data: { imported: 45, skipped: 3, failed: 2, recipes: [...] // preview } } ``` --- ## Data Mapping | CopyMeThat Field | Recipe Schema Field | Notes | |------------------|---------------------|-------| | `#name` | `title` | Direct mapping | | `#original_link` | `source_url` | Direct mapping | | `#description` | `description` | Direct mapping | | `.recipeCategory` | `tags` | Parse into tag array | | `#recipeYield` | `servings` | Extract number if possible | | `.recipeIngredient` | `ingredients[].item` | Plain text list | | `.instruction` | `steps[].instruction` | Numbered list | | `.recipeNote` | Notes field? | May need schema extension | | `.recipeImage` | `image_url` | Copy to app storage | | `#made_this` | Custom field? | Boolean flag | | `#ratingValue` | Custom field? | 1-5 rating | ### Schema Extensions Needed - `made: boolean` — User has cooked this - `rating: number` — 1-5 stars - `notes: string` — General notes field --- ## Edge Cases to Handle 1. **Duplicate detection** — Match on title + source_url 2. **Missing fields** — Title/ingredients/steps are required 3. **Image handling** — Copy images or store paths? 4. **Encoding** — UTF-8 special characters 5. **HTML entities** — `&`, `"`, etc. 6. **Large batches** — Memory limits for 100+ recipes 7. **Malformed HTML** — Graceful degradation --- ## Next Steps (Phase 2) 1. Extend Recipe schema with `made`, `rating`, `notes` fields 2. Implement `CopyMeThatHtmlParser` service 3. Create `POST /api/recipes/import/file` endpoint 4. Add multipart file upload handler 5. Unit tests for parser 6. Integration tests for endpoint --- **Status:** ✅ Analysis complete, ready for implementation