thingamablog-v2/DESIGN.md

239 lines
8.6 KiB
Markdown

# Thingamablog v2 - Technical Design Document
## Overview
Thingamablog v2 is a migration and modernization project for Paul's legacy blog data from the early 2000s Thingamablog platform. The system extracts blog posts from an obsolete HSQLDB database, cleans and structures the data, and provides a modern web interface for browsing.
## Project Structure
There are two related projects:
1. **thingamablog-api** (`projects/thingamablog-api/`): Contains the Java export tool for data extraction
2. **thingamablog-v2** (`projects/thingamablog-v2/`): Contains the web app for browsing the extracted data
No Git repositories are set up for these folders.
## Problem Statement
- **Legacy Data Lock-In:** Blog posts stored in HSQLDB 1.8 (obsolete Java DB format from 2000s)
- **Data Extraction Challenges:** Binary format difficult to parse reliably with modern tools
- **User Experience:** No easy way to browse/search 20+ years of personal blog content
- **Future Integration:** Need structured data for AI/vector search (Open Brain project)
## Solution Architecture
### High-Level Architecture
```
[Legacy HSQLDB DB] → [Java Export Tool] → [blog-export.json] → [Node.js Backend] → [React Frontend]
(Binary Data) (Clean JSON) (API Server) (Web UI)
```
### Components
#### 1. Data Extraction Layer (Java CLI Tool)
- **Location:** `projects/thingamablog-api/ExportTool.java`
- **Purpose:** Bridge from legacy DB to modern JSON
- **Technology:** Java with HSQLDB 1.8.0.10 driver
- **Input:** HSQLDB database files (`database.script`, `database.data`)
- **Output:** `blog-export.json` with structured blog entries
- **Design Decision:** Standalone CLI tool avoids Spring Boot complexity
#### 2. Data Storage Layer (JSON File)
- **Format:** Clean JSON array of blog post objects
- **Schema:**
```json
{
"id": 1,
"title": "Digital Imaging Notes",
"date": "2003-11-03 16:41:22.053",
"author": "Paul",
"categories": "Hobbies",
"content": "<p>Full HTML content...</p>"
}
```
- **Benefits:** Human-readable, easily parseable, version-controllable
#### 3. Backend API Layer (Node.js/Express)
- **Endpoints:**
- `GET /api/posts` - List posts with metadata
- `GET /api/posts/:id` - Full post content
- **Features:**
- Priority loading of `blog-export.json`
- Fallback to legacy HSQLDB parser
- Static image serving
- CORS support for frontend
#### 4. Frontend UI Layer (React/Material-UI)
- **Components:**
- PostList: Scrollable filtered list
- PostDetail: Full content viewer
- Filters: Category/date navigation
- **Responsive Design:** Desktop and mobile support
## Detailed Design
### Data Extraction Process
#### The "Bridge" Approach
1. **Java Tool Creation:**
- `ExportTool.java`: Simple class using JDBC to connect to HSQLDB
- Uses `hsqldb-1.8.0.10.jar` driver (downloaded via Maven)
- Executes SQL query: `SELECT * FROM ENTRY_TABLE_1096292361887`
- Maps result set to JSON objects
2. **Compilation & Execution:**
```bash
cd projects/thingamablog-api
javac -cp target/dependency/hsqldb-1.8.0.10.jar ExportTool.java
java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > ../thingamablog-v2/backend/blog-export.json
```
3. **Data Cleaning:**
- Converts timestamps to readable format
- Preserves HTML content in body
- Extracts categories from metadata
- Assigns sequential IDs
#### Migration Results
- **Input:** Binary HSQLDB files (~2MB total)
- **Output:** Clean JSON (1.3MB, 467 entries)
- **Quality:** Perfect titles, dates, HTML content preserved
- **Performance:** One-time export, instant loading thereafter
### API Design
#### REST Endpoints
- **GET /api/posts**
- Response: Array of post summaries
- Filtering: None (client-side)
- Sorting: By date descending (for clean JSON)
- **GET /api/posts/:id**
- Response: Full post object
- Image URL rewriting: Converts Windows paths to HTTP URLs
- Error handling: 404 for missing posts
#### Data Flow
1. Frontend requests post list
2. Backend loads JSON/falls back to DB parsing
3. Backend serves filtered/sorted data
4. Frontend renders with Material-UI components
### Frontend Design
#### Component Hierarchy
```
App
├── Sidebar (Filters)
│ ├── AllPostsFilter
│ ├── CategoryAccordion
│ └── ArchiveAccordion
├── PostList (Scrollable)
└── PostViewer (Rich content)
```
#### State Management
- React hooks for local state
- No external state library (simple app)
- URL-based state for selected post/filter
#### UI/UX Principles
- **Progressive Disclosure:** Filters collapsed by default
- **Responsive Grid:** 3-column desktop, stacked mobile
- **Accessibility:** Keyboard navigation, screen reader support
- **Performance:** Virtual scrolling for large post lists
## Implementation Details
### Technology Choices
| Component | Technology | Rationale |
|-----------|------------|----------|
| Export Tool | Java + HSQLDB Driver | Native compatibility with legacy DB |
| Data Format | JSON | Universal, human-readable |
| Backend | Node.js/Express | Simple, fast for file-based data |
| Frontend | React/Material-UI | Modern, component-based UI |
| Images | Static serving | Direct file access for performance |
### Static Assets & Image Handling
#### Image Storage Location
- **Path:** `/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/1096292361887/web`
- **Contents:** ~3.2MB of image files (JPG, GIF) from original blog posts
- **Examples:** Robotics photos, running/training images, technical diagrams, circuit schematics
- **Date Range:** Images from 2004-2009 blog posts
#### Image Serving
- **URL Path:** `/1096292361887/web/*` (preserves original blog structure)
- **Implementation:** Express.static middleware serves files directly
- **Fallback:** Root path `/` also serves images for compatibility
#### Image URL Rewriting
- **Problem:** Original blog posts contain Windows file paths (`file:///\\paulspc2\...\Weblogs/FILENAME.jpg`)
- **Solution:** Backend rewrites image src attributes to HTTP URLs (`http://localhost:3637/FILENAME.jpg`)
- **Regex Pattern:** Converts Windows UNC paths to web-accessible URLs
- **Benefit:** Images display correctly without manual editing of HTML content
### File Structure
```
projects/
├── thingamablog-api/ # Data extraction tool
│ ├── ExportTool.java # Java export tool
│ ├── pom.xml # Maven config
│ ├── src/ # Maven source structure
│ ├── target/ # Compiled classes + dependencies
│ └── api.log # Execution log
└── thingamablog-v2/ # Web application
├── backend/
│ ├── server.js # API server
│ ├── hsqldbParser.js # Fallback parser
│ └── blog-export.json # Clean data
└── frontend/src/
├── App.js # Main UI
├── theme.js # Styling
└── components/ # Reusable UI pieces
```
### Deployment & Operations
#### Local Development
1. Extract data: `cd projects/thingamablog-api && java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > ../thingamablog-v2/backend/blog-export.json`
2. Start backend: `cd projects/thingamablog-v2/backend && node server.js`
3. Start frontend: `cd projects/thingamablog-v2/frontend && npm start`
4. Access: `http://localhost:3000`
#### Production Considerations
- **Scalability:** JSON file size (1.3MB) is fine for personal use
- **Backup:** Version control the JSON export
- **Updates:** Re-run export tool if DB changes
- **Security:** Local-only access, no authentication needed
## Risks & Mitigations
| Risk | Mitigation |
|------|------------|
| HSQLDB driver availability | Downloaded and cached locally |
| Java version compatibility | Tested with Java 8+ |
| Data corruption | JSON validation on load |
| Performance with large datasets | Client-side pagination/filtering |
## Future Enhancements
- **Search:** Full-text search within posts
- **Tagging:** Enhanced category management
- **Export:** Additional formats (Markdown, PDF)
- **Open Brain Integration:** Vector embedding for AI queries
- **Multi-user:** User accounts and permissions
## Success Metrics
- **Data Integrity:** 100% posts extracted with correct metadata
- **Performance:** <2s page load, <500ms API response
- **Usability:** Intuitive filtering/navigation
- **Maintainability:** Clear code structure, comprehensive docs
## Conclusion
Thingamablog v2 successfully modernizes legacy blog data through a "bridge" approach, providing a clean migration path from obsolete technology to contemporary web standards. The modular design allows for easy maintenance and future AI integrations.