recipe-manager/.harness/local-file-import-phase1-an...

5.5 KiB

Phase 1: CopyMeThat Format Analysis

Completed: 2026-03-28 23:52 EDT
Analyzed by: Cleo


File Structure

TXT Export

  • Location: data/exports/Copy_Me_That_TXT_20260328_58775_z1p5lpjsgz/
  • Format: One .txt file per recipe
  • Naming: Snake-case recipe names (e.g., apple_crowned_coffee_cake.txt)
  • Estimated count: ~50-100+ recipes

HTML Export

  • Location: data/exports/Copy_Me_That_HTML_20260328_58775_z1p5lpjsgz/
  • Format: Single recipes.html file containing ALL recipes
  • Images: /images/ subfolder with recipe photos
  • Structure: Semantic HTML with consistent IDs

TXT Format Specification

Structure

[Recipe Title]

Adapted from [URL]

tags: [Tag1], [Tag2], [Tag3]

[Optional: "I made this."]

Servings: [serving info]

INGREDIENTS

[ingredient 1]
[ingredient 2]
...

STEPS

1) [step 1]

2) [step 2]

...

NOTES

[optional notes]

Example

Apple-Crowned Coffee Cake

Adapted from http://www.kraftcanada.com/recipes/apple-crowned-coffee-cake-191423

tags: Cake, Dessert

I made this.

Servings: 16 servings, 1 piece (76 g) each

INGREDIENTS

2 cups flour
2 Tbsp. granulated sugar
...

STEPS

1) Heat oven to 375°F.

2) Combine flour...

NOTES

If the glaze is too thick...

Key Fields

  • Title: First line
  • Source URL: "Adapted from [URL]"
  • Tags: Comma-separated after "tags:"
  • Made flag: Presence of "I made this."
  • Servings: After "Servings:"
  • Ingredients: Plain list between "INGREDIENTS" and "STEPS"
  • Instructions: Numbered list after "STEPS"
  • Notes: Optional, after "NOTES"

HTML Format Specification

Structure

Single HTML file with repeated .recipe div blocks:

<div class="recipe">
  <div id="name">Recipe Title</div>
  <div id="link">
    Adapted from <a id="original_link" href="...">URL</a>
  </div>
  <img class="recipeImage" src="images/filename.jpg"/>
  <div id="categories">
    <span class="recipeCategory">Tag1</span>
    <span class="recipeCategory">Tag2</span>
  </div>
  <div id="description">Description text</div>
  <div id="extra_info">
    <span id="made_this">I made this.</span>
    <span id="rating">Rated <span id="ratingValue">3</span>/5</span>
  </div>
  <div id="servings">
    Servings: <a id="recipeYield">8 servings...</a>
  </div>
  <ul id="recipeIngredients">
    <li class="recipeIngredient">ingredient 1</li>
    ...
  </ul>
  <ol id="recipeInstructions">
    <li class="instruction" value="1">step 1</li>
    ...
  </ol>
  <div id="recipeNotes">
    <div class="recipeNote">note text</div>
  </div>
</div>

Key Selectors

  • .recipe — Recipe container
  • #name — Title
  • #original_link — Source URL
  • .recipeImage — Image path
  • .recipeCategory — Tags
  • #description — Description
  • #made_this — Made flag
  • #ratingValue — Rating (1-5)
  • #recipeYield — Servings
  • .recipeIngredient — Ingredients (list items)
  • .instruction — Steps (ordered list items)
  • .recipeNote — Notes

Implementation Strategy

Rationale:

  • HTML has MORE data (images, ratings, descriptions)
  • Single file = easier batch import
  • Well-structured semantic markup
  • Images already linked

Fallback: TXT parser for edge cases

Parser Architecture

ImportService
├── CopyMeThatHtmlParser
│   ├── parseRecipes(html: string): Recipe[]
│   ├── extractRecipeBlocks(html: string): HTMLElement[]
│   └── parseRecipeBlock(block: HTMLElement): Recipe
└── CopyMeThatTxtParser (optional fallback)
    └── parseTxtFile(content: string): Recipe

API Endpoint Design

POST /api/recipes/import/copyme that
Content-Type: multipart/form-data

Request:
- file: recipes.html OR multiple .txt files
- options: { skipDuplicates: boolean, importImages: boolean }

Response:
{
  success: true,
  data: {
    imported: 45,
    skipped: 3,
    failed: 2,
    recipes: [...] // preview
  }
}

Data Mapping

CopyMeThat Field Recipe Schema Field Notes
#name title Direct mapping
#original_link source_url Direct mapping
#description description Direct mapping
.recipeCategory tags Parse into tag array
#recipeYield servings Extract number if possible
.recipeIngredient ingredients[].item Plain text list
.instruction steps[].instruction Numbered list
.recipeNote Notes field? May need schema extension
.recipeImage image_url Copy to app storage
#made_this Custom field? Boolean flag
#ratingValue Custom field? 1-5 rating

Schema Extensions Needed

  • made: boolean — User has cooked this
  • rating: number — 1-5 stars
  • notes: string — General notes field

Edge Cases to Handle

  1. Duplicate detection — Match on title + source_url
  2. Missing fields — Title/ingredients/steps are required
  3. Image handling — Copy images or store paths?
  4. Encoding — UTF-8 special characters
  5. HTML entities&amp;, &quot;, etc.
  6. Large batches — Memory limits for 100+ recipes
  7. Malformed HTML — Graceful degradation

Next Steps (Phase 2)

  1. Extend Recipe schema with made, rating, notes fields
  2. Implement CopyMeThatHtmlParser service
  3. Create POST /api/recipes/import/file endpoint
  4. Add multipart file upload handler
  5. Unit tests for parser
  6. Integration tests for endpoint

Status: Analysis complete, ready for implementation