5.5 KiB
5.5 KiB
Phase 1: CopyMeThat Format Analysis
Completed: 2026-03-28 23:52 EDT
Analyzed by: Cleo
File Structure
TXT Export
- Location:
data/exports/Copy_Me_That_TXT_20260328_58775_z1p5lpjsgz/ - Format: One
.txtfile per recipe - Naming: Snake-case recipe names (e.g.,
apple_crowned_coffee_cake.txt) - Estimated count: ~50-100+ recipes
HTML Export
- Location:
data/exports/Copy_Me_That_HTML_20260328_58775_z1p5lpjsgz/ - Format: Single
recipes.htmlfile containing ALL recipes - Images:
/images/subfolder with recipe photos - Structure: Semantic HTML with consistent IDs
TXT Format Specification
Structure
[Recipe Title]
Adapted from [URL]
tags: [Tag1], [Tag2], [Tag3]
[Optional: "I made this."]
Servings: [serving info]
INGREDIENTS
[ingredient 1]
[ingredient 2]
...
STEPS
1) [step 1]
2) [step 2]
...
NOTES
[optional notes]
Example
Apple-Crowned Coffee Cake
Adapted from http://www.kraftcanada.com/recipes/apple-crowned-coffee-cake-191423
tags: Cake, Dessert
I made this.
Servings: 16 servings, 1 piece (76 g) each
INGREDIENTS
2 cups flour
2 Tbsp. granulated sugar
...
STEPS
1) Heat oven to 375°F.
2) Combine flour...
NOTES
If the glaze is too thick...
Key Fields
- Title: First line
- Source URL: "Adapted from [URL]"
- Tags: Comma-separated after "tags:"
- Made flag: Presence of "I made this."
- Servings: After "Servings:"
- Ingredients: Plain list between "INGREDIENTS" and "STEPS"
- Instructions: Numbered list after "STEPS"
- Notes: Optional, after "NOTES"
HTML Format Specification
Structure
Single HTML file with repeated .recipe div blocks:
<div class="recipe">
<div id="name">Recipe Title</div>
<div id="link">
Adapted from <a id="original_link" href="...">URL</a>
</div>
<img class="recipeImage" src="images/filename.jpg"/>
<div id="categories">
<span class="recipeCategory">Tag1</span>
<span class="recipeCategory">Tag2</span>
</div>
<div id="description">Description text</div>
<div id="extra_info">
<span id="made_this">I made this.</span>
<span id="rating">Rated <span id="ratingValue">3</span>/5</span>
</div>
<div id="servings">
Servings: <a id="recipeYield">8 servings...</a>
</div>
<ul id="recipeIngredients">
<li class="recipeIngredient">ingredient 1</li>
...
</ul>
<ol id="recipeInstructions">
<li class="instruction" value="1">step 1</li>
...
</ol>
<div id="recipeNotes">
<div class="recipeNote">note text</div>
</div>
</div>
Key Selectors
.recipe— Recipe container#name— Title#original_link— Source URL.recipeImage— Image path.recipeCategory— Tags#description— Description#made_this— Made flag#ratingValue— Rating (1-5)#recipeYield— Servings.recipeIngredient— Ingredients (list items).instruction— Steps (ordered list items).recipeNote— Notes
Implementation Strategy
Recommended Approach: HTML Parser Primary
Rationale:
- HTML has MORE data (images, ratings, descriptions)
- Single file = easier batch import
- Well-structured semantic markup
- Images already linked
Fallback: TXT parser for edge cases
Parser Architecture
ImportService
├── CopyMeThatHtmlParser
│ ├── parseRecipes(html: string): Recipe[]
│ ├── extractRecipeBlocks(html: string): HTMLElement[]
│ └── parseRecipeBlock(block: HTMLElement): Recipe
└── CopyMeThatTxtParser (optional fallback)
└── parseTxtFile(content: string): Recipe
API Endpoint Design
POST /api/recipes/import/copyme that
Content-Type: multipart/form-data
Request:
- file: recipes.html OR multiple .txt files
- options: { skipDuplicates: boolean, importImages: boolean }
Response:
{
success: true,
data: {
imported: 45,
skipped: 3,
failed: 2,
recipes: [...] // preview
}
}
Data Mapping
| CopyMeThat Field | Recipe Schema Field | Notes |
|---|---|---|
#name |
title |
Direct mapping |
#original_link |
source_url |
Direct mapping |
#description |
description |
Direct mapping |
.recipeCategory |
tags |
Parse into tag array |
#recipeYield |
servings |
Extract number if possible |
.recipeIngredient |
ingredients[].item |
Plain text list |
.instruction |
steps[].instruction |
Numbered list |
.recipeNote |
Notes field? | May need schema extension |
.recipeImage |
image_url |
Copy to app storage |
#made_this |
Custom field? | Boolean flag |
#ratingValue |
Custom field? | 1-5 rating |
Schema Extensions Needed
made: boolean— User has cooked thisrating: number— 1-5 starsnotes: string— General notes field
Edge Cases to Handle
- Duplicate detection — Match on title + source_url
- Missing fields — Title/ingredients/steps are required
- Image handling — Copy images or store paths?
- Encoding — UTF-8 special characters
- HTML entities —
&,", etc. - Large batches — Memory limits for 100+ recipes
- Malformed HTML — Graceful degradation
Next Steps (Phase 2)
- Extend Recipe schema with
made,rating,notesfields - Implement
CopyMeThatHtmlParserservice - Create
POST /api/recipes/import/fileendpoint - Add multipart file upload handler
- Unit tests for parser
- Integration tests for endpoint
Status: ✅ Analysis complete, ready for implementation