# Thingamablog v2 Project Documentation ## Overview This project extracts and displays blog entries from Paul's old Thingamablog (a Windows blog platform from the early 2000s). The process involved multiple iterations: 1. **Initial Attempt:** Python script to parse HTML files (`blog.html`). 2. **Database Approach:** Discovered the blogs were stored in an HSQLDB database (old Java-based SQL DB). 3. **Java/Spring Boot:** Used old HSQLDB Java library in a Spring Boot app to extract data. 4. **Node.js Solution:** Final implementation using a Node.js app to read HSQLDB directly and export to JSON. 5. **Web App:** Built a full-stack web app (React frontend + Express backend) to browse and display the blog entries. The result is a clean JSON export (`blog-export.json`) containing all blog posts, which can be browsed via a modern web interface. ## Latest Updates (March 2026) - ✅ **Git Repository:** Created on Gitea at https://paje.ca/git/paulh/thingamablog-v2 - ✅ **Documentation:** Comprehensive DESIGN.md and updated README.md - ✅ **Data Quality:** 467 blog posts successfully extracted with perfect metadata - 🔄 **Git Push:** Local repo initialized and committed, awaiting network resolution for push - 🔗 **Integration:** Ready for Open Brain vector ingestion ## Data Extraction Process The data was extracted using the "Bridge" approach from the companion `thingamablog-api` project: - **Source:** Old HSQLDB database files in `docs/pauls-blogs/Paul/database/` - **Tool:** Java CLI application (`ExportTool.java`) with HSQLDB 1.8.0.10 driver - **Command:** `java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > blog-export.json` - **Output:** `blog-export.json` with structured blog entries (title, date, content, categories, etc.) - **Quality:** Perfect extraction - 467 entries, full HTML content preserved, clean metadata - **Size:** 1.3MB of structured JSON data ## Project Structure ``` projects/thingamablog-v2/ ├── backend/ # Express.js server │ ├── server.js # Main server file │ ├── hsqldbParser.js # Legacy HSQLDB parser (fallback) │ ├── blog-export.json # Exported blog data (1.7MB) │ └── package.json ├── frontend/ # React app │ ├── src/ │ │ ├── App.js # Main component │ │ ├── theme.js # Material-UI theme │ │ └── index.js │ └── package.json ├── DESIGN.md # Technical design document ├── README.md # This file └── .gitignore # Excludes node_modules, build artifacts ``` ## Running the Application ### Prerequisites - Node.js installed - Backend dependencies: `cd backend && npm install` - Frontend dependencies: `cd frontend && npm install` ### Start Backend ```bash cd projects/thingamablog-v2/backend node server.js ``` - Runs on `http://localhost:3637` - Loads `blog-export.json` if available, else falls back to HSQLDB parser - Serves images from `/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/1096292361887/web` ### Start Frontend ```bash cd projects/thingamablog-v2/frontend npm start ``` - Runs on `http://localhost:3000` (opens automatically) - Connects to backend at `http://localhost:3637` ### Access the App Open `http://localhost:3000` in your browser. ## UI Description The web app provides a clean, modern interface to browse Paul's old blog posts: ### Layout - **Header:** "Thingamablog Archive" title - **Sidebar (Left):** Browse filters - "All Posts" - show everything - "Categories" accordion - hierarchical categories (e.g., Hobbies > Car Maintenance) - "Archives" accordion - posts by year - **Post List (Middle):** Scrollable list of posts with title, date, author - **Post Content (Right):** Full post display with images, categories as chips ### Features - **Filtering:** Click categories or years to filter posts - **Selection:** Click a post to view full content - **Images:** Embedded images load from served static files - **Responsive:** Adapts to screen size (stacks vertically on mobile) ### Sample View - Categories include: Hobbies, Personal, Robotics, etc. - Posts date back to 2000s, covering topics like 3D printing, car maintenance, tech projects - Content includes HTML formatting, links, and images ## API Endpoints - `GET /api/posts` - List all posts (id, title, date, category) - `GET /api/posts/:id` - Get full post data - Images served at `/1096292361887/web/*` or `/` ## Future Integration This data can be ingested into Open Brain for vector search: - Parse `blog-export.json` into chunks - Embed with OpenAI - Store in Supabase PGVector - Enable semantic queries across Paul's 20+ year blog history ## Development Status - **Backend:** ✅ Complete, tested with 467 posts - **Frontend:** ✅ Complete, Material-UI responsive design - **Data Extraction:** ✅ Complete, high-quality JSON export - **Documentation:** ✅ Complete (README.md, DESIGN.md) - **Git Repository:** ✅ Created on Gitea, awaiting push due to network issues - **Testing:** ✅ Manual testing successful ## Notes - No tests or CI/CD set up - Assumes local paths for images/database - Backend prioritizes JSON export over HSQLDB parsing for speed - Categories use `` and `` format - Network issues preventing final Git push - repos ready to push when connectivity restored