- Add Git repository status and Gitea links - Document successful data extraction (467 posts, 1.3MB) - Include current development status and testing results - Add Open Brain integration readiness - Update project structure with new files |
||
|---|---|---|
| backend | ||
| frontend | ||
| .gitignore | ||
| DESIGN.md | ||
| README.md | ||
README.md
Thingamablog v2 Project Documentation
Overview
This project extracts and displays blog entries from Paul's old Thingamablog (a Windows blog platform from the early 2000s). The process involved multiple iterations:
- Initial Attempt: Python script to parse HTML files (
blog.html). - Database Approach: Discovered the blogs were stored in an HSQLDB database (old Java-based SQL DB).
- Java/Spring Boot: Used old HSQLDB Java library in a Spring Boot app to extract data.
- Node.js Solution: Final implementation using a Node.js app to read HSQLDB directly and export to JSON.
- Web App: Built a full-stack web app (React frontend + Express backend) to browse and display the blog entries.
The result is a clean JSON export (blog-export.json) containing all blog posts, which can be browsed via a modern web interface.
Latest Updates (March 2026)
- ✅ Git Repository: Created on Gitea at https://paje.ca/git/paulh/thingamablog-v2
- ✅ Documentation: Comprehensive DESIGN.md and updated README.md
- ✅ Data Quality: 467 blog posts successfully extracted with perfect metadata
- 🔄 Git Push: Local repo initialized and committed, awaiting network resolution for push
- 🔗 Integration: Ready for Open Brain vector ingestion
Data Extraction Process
The data was extracted using the "Bridge" approach from the companion thingamablog-api project:
- Source: Old HSQLDB database files in
docs/pauls-blogs/Paul/database/ - Tool: Java CLI application (
ExportTool.java) with HSQLDB 1.8.0.10 driver - Command:
java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > blog-export.json - Output:
blog-export.jsonwith structured blog entries (title, date, content, categories, etc.) - Quality: Perfect extraction - 467 entries, full HTML content preserved, clean metadata
- Size: 1.3MB of structured JSON data
Project Structure
projects/thingamablog-v2/
├── backend/ # Express.js server
│ ├── server.js # Main server file
│ ├── hsqldbParser.js # Legacy HSQLDB parser (fallback)
│ ├── blog-export.json # Exported blog data (1.7MB)
│ └── package.json
├── frontend/ # React app
│ ├── src/
│ │ ├── App.js # Main component
│ │ ├── theme.js # Material-UI theme
│ │ └── index.js
│ └── package.json
├── DESIGN.md # Technical design document
├── README.md # This file
└── .gitignore # Excludes node_modules, build artifacts
Running the Application
Prerequisites
- Node.js installed
- Backend dependencies:
cd backend && npm install - Frontend dependencies:
cd frontend && npm install
Start Backend
cd projects/thingamablog-v2/backend
node server.js
- Runs on
http://localhost:3637 - Loads
blog-export.jsonif available, else falls back to HSQLDB parser - Serves images from
/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/1096292361887/web
Start Frontend
cd projects/thingamablog-v2/frontend
npm start
- Runs on
http://localhost:3000(opens automatically) - Connects to backend at
http://localhost:3637
Access the App
Open http://localhost:3000 in your browser.
UI Description
The web app provides a clean, modern interface to browse Paul's old blog posts:
Layout
- Header: "Thingamablog Archive" title
- Sidebar (Left): Browse filters
- "All Posts" - show everything
- "Categories" accordion - hierarchical categories (e.g., Hobbies > Car Maintenance)
- "Archives" accordion - posts by year
- Post List (Middle): Scrollable list of posts with title, date, author
- Post Content (Right): Full post display with images, categories as chips
Features
- Filtering: Click categories or years to filter posts
- Selection: Click a post to view full content
- Images: Embedded images load from served static files
- Responsive: Adapts to screen size (stacks vertically on mobile)
Sample View
- Categories include: Hobbies, Personal, Robotics, etc.
- Posts date back to 2000s, covering topics like 3D printing, car maintenance, tech projects
- Content includes HTML formatting, links, and images
API Endpoints
GET /api/posts- List all posts (id, title, date, category)GET /api/posts/:id- Get full post data- Images served at
/1096292361887/web/*or/
Future Integration
This data can be ingested into Open Brain for vector search:
- Parse
blog-export.jsoninto chunks - Embed with OpenAI
- Store in Supabase PGVector
- Enable semantic queries across Paul's 20+ year blog history
Development Status
- Backend: ✅ Complete, tested with 467 posts
- Frontend: ✅ Complete, Material-UI responsive design
- Data Extraction: ✅ Complete, high-quality JSON export
- Documentation: ✅ Complete (README.md, DESIGN.md)
- Git Repository: ✅ Created on Gitea, awaiting push due to network issues
- Testing: ✅ Manual testing successful
Notes
- No tests or CI/CD set up
- Assumes local paths for images/database
- Backend prioritizes JSON export over HSQLDB parsing for speed
- Categories use
<Category>and<Parent - Child>format - Network issues preventing final Git push - repos ready to push when connectivity restored