134 lines
5.3 KiB
Markdown
134 lines
5.3 KiB
Markdown
# Thingamablog v2 Project Documentation
|
|
|
|
## Overview
|
|
|
|
This project extracts and displays blog entries from Paul's old Thingamablog (a Windows blog platform from the early 2000s). The process involved multiple iterations:
|
|
|
|
1. **Initial Attempt:** Python script to parse HTML files (`blog.html`).
|
|
2. **Database Approach:** Discovered the blogs were stored in an HSQLDB database (old Java-based SQL DB).
|
|
3. **Java/Spring Boot:** Used old HSQLDB Java library in a Spring Boot app to extract data.
|
|
4. **Node.js Solution:** Final implementation using a Node.js app to read HSQLDB directly and export to JSON.
|
|
5. **Web App:** Built a full-stack web app (React frontend + Express backend) to browse and display the blog entries.
|
|
|
|
The result is a clean JSON export (`blog-export.json`) containing all blog posts, which can be browsed via a modern web interface.
|
|
|
|
## Latest Updates (March 2026)
|
|
|
|
- ✅ **Git Repository:** Created on Gitea at https://paje.ca/git/paulh/thingamablog-v2
|
|
- ✅ **Documentation:** Comprehensive DESIGN.md and updated README.md
|
|
- ✅ **Data Quality:** 467 blog posts successfully extracted with perfect metadata
|
|
- 🔄 **Git Push:** Local repo initialized and committed, awaiting network resolution for push
|
|
- 🔗 **Integration:** Ready for Open Brain vector ingestion
|
|
|
|
## Data Extraction Process
|
|
|
|
The data was extracted using the "Bridge" approach from the companion `thingamablog-api` project:
|
|
|
|
- **Source:** Old HSQLDB database files in `docs/pauls-blogs/Paul/database/`
|
|
- **Tool:** Java CLI application (`ExportTool.java`) with HSQLDB 1.8.0.10 driver
|
|
- **Command:** `java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > blog-export.json`
|
|
- **Output:** `blog-export.json` with structured blog entries (title, date, content, categories, etc.)
|
|
- **Quality:** Perfect extraction - 467 entries, full HTML content preserved, clean metadata
|
|
- **Size:** 1.3MB of structured JSON data
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
projects/thingamablog-v2/
|
|
├── backend/ # Express.js server
|
|
│ ├── server.js # Main server file
|
|
│ ├── hsqldbParser.js # Legacy HSQLDB parser (fallback)
|
|
│ ├── blog-export.json # Exported blog data (1.7MB)
|
|
│ └── package.json
|
|
├── frontend/ # React app
|
|
│ ├── src/
|
|
│ │ ├── App.js # Main component
|
|
│ │ ├── theme.js # Material-UI theme
|
|
│ │ └── index.js
|
|
│ └── package.json
|
|
├── DESIGN.md # Technical design document
|
|
├── README.md # This file
|
|
└── .gitignore # Excludes node_modules, build artifacts
|
|
```
|
|
|
|
## Running the Application
|
|
|
|
### Prerequisites
|
|
- Node.js installed
|
|
- Backend dependencies: `cd backend && npm install`
|
|
- Frontend dependencies: `cd frontend && npm install`
|
|
|
|
### Start Backend
|
|
```bash
|
|
cd projects/thingamablog-v2/backend
|
|
node server.js
|
|
```
|
|
- Runs on `http://localhost:3637`
|
|
- Loads `blog-export.json` if available, else falls back to HSQLDB parser
|
|
- Serves images from `/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/1096292361887/web`
|
|
|
|
### Start Frontend
|
|
```bash
|
|
cd projects/thingamablog-v2/frontend
|
|
npm start
|
|
```
|
|
- Runs on `http://localhost:3000` (opens automatically)
|
|
- Connects to backend at `http://localhost:3637`
|
|
|
|
### Access the App
|
|
Open `http://localhost:3000` in your browser.
|
|
|
|
## UI Description
|
|
|
|
The web app provides a clean, modern interface to browse Paul's old blog posts:
|
|
|
|
### Layout
|
|
- **Header:** "Thingamablog Archive" title
|
|
- **Sidebar (Left):** Browse filters
|
|
- "All Posts" - show everything
|
|
- "Categories" accordion - hierarchical categories (e.g., Hobbies > Car Maintenance)
|
|
- "Archives" accordion - posts by year
|
|
- **Post List (Middle):** Scrollable list of posts with title, date, author
|
|
- **Post Content (Right):** Full post display with images, categories as chips
|
|
|
|
### Features
|
|
- **Filtering:** Click categories or years to filter posts
|
|
- **Selection:** Click a post to view full content
|
|
- **Images:** Embedded images load from served static files
|
|
- **Responsive:** Adapts to screen size (stacks vertically on mobile)
|
|
|
|
### Sample View
|
|
- Categories include: Hobbies, Personal, Robotics, etc.
|
|
- Posts date back to 2000s, covering topics like 3D printing, car maintenance, tech projects
|
|
- Content includes HTML formatting, links, and images
|
|
|
|
## API Endpoints
|
|
|
|
- `GET /api/posts` - List all posts (id, title, date, category)
|
|
- `GET /api/posts/:id` - Get full post data
|
|
- Images served at `/1096292361887/web/*` or `/`
|
|
|
|
## Future Integration
|
|
|
|
This data can be ingested into Open Brain for vector search:
|
|
- Parse `blog-export.json` into chunks
|
|
- Embed with OpenAI
|
|
- Store in Supabase PGVector
|
|
- Enable semantic queries across Paul's 20+ year blog history
|
|
|
|
## Development Status
|
|
|
|
- **Backend:** ✅ Complete, tested with 467 posts
|
|
- **Frontend:** ✅ Complete, Material-UI responsive design
|
|
- **Data Extraction:** ✅ Complete, high-quality JSON export
|
|
- **Documentation:** ✅ Complete (README.md, DESIGN.md)
|
|
- **Git Repository:** ✅ Created on Gitea, awaiting push due to network issues
|
|
- **Testing:** ✅ Manual testing successful
|
|
|
|
## Notes
|
|
|
|
- No tests or CI/CD set up
|
|
- Assumes local paths for images/database
|
|
- Backend prioritizes JSON export over HSQLDB parsing for speed
|
|
- Categories use `<Category>` and `<Parent - Child>` format
|
|
- Network issues preventing final Git push - repos ready to push when connectivity restored |