# Thingamablog API - Data Extraction Tool ## Overview This is the data extraction component of the Thingamablog migration project. It contains a Java CLI tool that bridges legacy HSQLDB database files to modern JSON format, enabling the web application to serve clean, structured blog data. ## Purpose The Thingamablog platform (early 2000s) stored blog posts in an obsolete HSQLDB database format. This tool extracts that data into a clean JSON format that can be consumed by modern applications. ## Architecture - **Input:** HSQLDB database files (`database.script`, `database.data`) - **Tool:** `ExportTool.java` - JDBC-based Java application - **Driver:** HSQLDB 1.8.0.10 JAR (legacy compatible) - **Output:** `blog-export.json` - Structured JSON array of blog posts ## Setup & Build ### Prerequisites - Java 8 or higher - Maven 3.x (for dependency management) ### Dependencies - HSQLDB 1.8.0.10 JAR (automatically downloaded by Maven) - Maven coordinates: `org.hsqldb:hsqldb:1.8.0.10` ### Build Process ```bash # Download dependencies mvn dependency:copy-dependencies # Compile the tool javac -cp target/dependency/hsqldb-1.8.0.10.jar ExportTool.java # The compiled class will be in the root directory ``` ## Usage ### Command Line ```bash java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > ../thingamablog-v2/backend/blog-export.json ``` ### What It Does 1. Connects to HSQLDB database at `/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/database/` 2. Queries the `ENTRY_TABLE_1096292361887` table 3. Maps database columns to JSON fields: - `ID` → `id` - `TITLE` → `title` - `TIMESTAMP` → `date` - `ENTRY` → `content` - `CATEGORIES` → `categories` - `AUTHOR` → `author` 4. Outputs clean JSON array to stdout ### Sample Output ```json [ { "id": 1, "title": "Digital Imaging Notes", "date": "2003-11-03 16:41:22.053", "author": "Paul", "categories": "Hobbies", "content": "
Full HTML content preserved...
" } ] ``` ## Data Quality The export produces high-quality data: - ✅ Perfect titles and dates - ✅ Full HTML content preserved - ✅ Categories properly extracted - ✅ Sequential IDs assigned - ✅ JSON validation passes - ✅ 467 entries successfully extracted (1.3MB) ## Integration The exported JSON feeds directly into the thingamablog-v2 web application: 1. Place `blog-export.json` in `../thingamablog-v2/backend/` 2. The Node.js backend prioritizes this clean JSON over the fallback HSQLDB parser 3. Web app serves posts via REST API ## Troubleshooting ### Common Issues **JDBC Driver Not Found** ``` Error: org.hsqldb.jdbcDriver ``` - Ensure Maven has downloaded the dependency: `mvn dependency:copy-dependencies` - Check classpath includes `target/dependency/hsqldb-1.8.0.10.jar` **Database Path Issues** ``` SQL Exception: file not found ``` - Verify HSQLDB files exist at the hardcoded path - Ensure read permissions on database files **Empty Output** - Check database file integrity - Verify table name `ENTRY_TABLE_1096292361887` exists ### Legacy Considerations - HSQLDB 1.8.0.10 is from 2004 - very old format - Modern HSQLDB versions may not read these files - The "Bridge" approach isolates legacy dependencies ## File Structure ``` thingamablog-api/ ├── ExportTool.java # Main extraction tool ├── pom.xml # Maven configuration ├── src/main/java/... # Additional Spring Boot components (unused) ├── target/dependency/ # Maven dependencies └── .gitignore # Excludes build artifacts ``` ## Development Notes - Originally attempted with Spring Boot and newer HSQLDB drivers - Simplified to standalone Java CLI for reliability - Hardcoded paths for single-purpose extraction - JSON escaping implemented for HTML content safety ## Related Projects - **thingamablog-v2**: Web application that consumes the exported JSON - **docs/thingamablog-extract**: Alternative extraction results (Markdown format) ## Future Improvements - Parameterize database path and output file - Add command-line arguments for flexibility - Support for other HSQLDB table schemas - Integration with modern database migration tools