- Setup and build instructions for Java export tool - Usage examples and sample output - Data quality metrics and troubleshooting - Integration details with thingamablog-v2 - File structure and development notes |
||
|---|---|---|
| src/main | ||
| .gitignore | ||
| ExportTool.java | ||
| README.md | ||
| pom.xml | ||
README.md
Thingamablog API - Data Extraction Tool
Overview
This is the data extraction component of the Thingamablog migration project. It contains a Java CLI tool that bridges legacy HSQLDB database files to modern JSON format, enabling the web application to serve clean, structured blog data.
Purpose
The Thingamablog platform (early 2000s) stored blog posts in an obsolete HSQLDB database format. This tool extracts that data into a clean JSON format that can be consumed by modern applications.
Architecture
- Input: HSQLDB database files (
database.script,database.data) - Tool:
ExportTool.java- JDBC-based Java application - Driver: HSQLDB 1.8.0.10 JAR (legacy compatible)
- Output:
blog-export.json- Structured JSON array of blog posts
Setup & Build
Prerequisites
- Java 8 or higher
- Maven 3.x (for dependency management)
Dependencies
- HSQLDB 1.8.0.10 JAR (automatically downloaded by Maven)
- Maven coordinates:
org.hsqldb:hsqldb:1.8.0.10
Build Process
# Download dependencies
mvn dependency:copy-dependencies
# Compile the tool
javac -cp target/dependency/hsqldb-1.8.0.10.jar ExportTool.java
# The compiled class will be in the root directory
Usage
Command Line
java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > ../thingamablog-v2/backend/blog-export.json
What It Does
- Connects to HSQLDB database at
/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/database/ - Queries the
ENTRY_TABLE_1096292361887table - Maps database columns to JSON fields:
ID→idTITLE→titleTIMESTAMP→dateENTRY→contentCATEGORIES→categoriesAUTHOR→author
- Outputs clean JSON array to stdout
Sample Output
[
{
"id": 1,
"title": "Digital Imaging Notes",
"date": "2003-11-03 16:41:22.053",
"author": "Paul",
"categories": "Hobbies",
"content": "<p>Full HTML content preserved...</p>"
}
]
Data Quality
The export produces high-quality data:
- ✅ Perfect titles and dates
- ✅ Full HTML content preserved
- ✅ Categories properly extracted
- ✅ Sequential IDs assigned
- ✅ JSON validation passes
- ✅ 467 entries successfully extracted (1.3MB)
Integration
The exported JSON feeds directly into the thingamablog-v2 web application:
- Place
blog-export.jsonin../thingamablog-v2/backend/ - The Node.js backend prioritizes this clean JSON over the fallback HSQLDB parser
- Web app serves posts via REST API
Troubleshooting
Common Issues
JDBC Driver Not Found
Error: org.hsqldb.jdbcDriver
- Ensure Maven has downloaded the dependency:
mvn dependency:copy-dependencies - Check classpath includes
target/dependency/hsqldb-1.8.0.10.jar
Database Path Issues
SQL Exception: file not found
- Verify HSQLDB files exist at the hardcoded path
- Ensure read permissions on database files
Empty Output
- Check database file integrity
- Verify table name
ENTRY_TABLE_1096292361887exists
Legacy Considerations
- HSQLDB 1.8.0.10 is from 2004 - very old format
- Modern HSQLDB versions may not read these files
- The "Bridge" approach isolates legacy dependencies
File Structure
thingamablog-api/
├── ExportTool.java # Main extraction tool
├── pom.xml # Maven configuration
├── src/main/java/... # Additional Spring Boot components (unused)
├── target/dependency/ # Maven dependencies
└── .gitignore # Excludes build artifacts
Development Notes
- Originally attempted with Spring Boot and newer HSQLDB drivers
- Simplified to standalone Java CLI for reliability
- Hardcoded paths for single-purpose extraction
- JSON escaping implemented for HTML content safety
Related Projects
- thingamablog-v2: Web application that consumes the exported JSON
- docs/thingamablog-extract: Alternative extraction results (Markdown format)
Future Improvements
- Parameterize database path and output file
- Add command-line arguments for flexibility
- Support for other HSQLDB table schemas
- Integration with modern database migration tools