thingamablog-api/README.md

146 lines
4.1 KiB
Markdown

# Thingamablog API - Data Extraction Tool
## Overview
This is the data extraction component of the Thingamablog migration project. It contains a Java CLI tool that bridges legacy HSQLDB database files to modern JSON format, enabling the web application to serve clean, structured blog data.
## Purpose
The Thingamablog platform (early 2000s) stored blog posts in an obsolete HSQLDB database format. This tool extracts that data into a clean JSON format that can be consumed by modern applications.
## Architecture
- **Input:** HSQLDB database files (`database.script`, `database.data`)
- **Tool:** `ExportTool.java` - JDBC-based Java application
- **Driver:** HSQLDB 1.8.0.10 JAR (legacy compatible)
- **Output:** `blog-export.json` - Structured JSON array of blog posts
## Setup & Build
### Prerequisites
- Java 8 or higher
- Maven 3.x (for dependency management)
### Dependencies
- HSQLDB 1.8.0.10 JAR (automatically downloaded by Maven)
- Maven coordinates: `org.hsqldb:hsqldb:1.8.0.10`
### Build Process
```bash
# Download dependencies
mvn dependency:copy-dependencies
# Compile the tool
javac -cp target/dependency/hsqldb-1.8.0.10.jar ExportTool.java
# The compiled class will be in the root directory
```
## Usage
### Command Line
```bash
java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > ../thingamablog-v2/backend/blog-export.json
```
### What It Does
1. Connects to HSQLDB database at `/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/database/`
2. Queries the `ENTRY_TABLE_1096292361887` table
3. Maps database columns to JSON fields:
- `ID``id`
- `TITLE``title`
- `TIMESTAMP``date`
- `ENTRY``content`
- `CATEGORIES``categories`
- `AUTHOR``author`
4. Outputs clean JSON array to stdout
### Sample Output
```json
[
{
"id": 1,
"title": "Digital Imaging Notes",
"date": "2003-11-03 16:41:22.053",
"author": "Paul",
"categories": "Hobbies",
"content": "<p>Full HTML content preserved...</p>"
}
]
```
## Data Quality
The export produces high-quality data:
- ✅ Perfect titles and dates
- ✅ Full HTML content preserved
- ✅ Categories properly extracted
- ✅ Sequential IDs assigned
- ✅ JSON validation passes
- ✅ 467 entries successfully extracted (1.3MB)
## Integration
The exported JSON feeds directly into the thingamablog-v2 web application:
1. Place `blog-export.json` in `../thingamablog-v2/backend/`
2. The Node.js backend prioritizes this clean JSON over the fallback HSQLDB parser
3. Web app serves posts via REST API
## Troubleshooting
### Common Issues
**JDBC Driver Not Found**
```
Error: org.hsqldb.jdbcDriver
```
- Ensure Maven has downloaded the dependency: `mvn dependency:copy-dependencies`
- Check classpath includes `target/dependency/hsqldb-1.8.0.10.jar`
**Database Path Issues**
```
SQL Exception: file not found
```
- Verify HSQLDB files exist at the hardcoded path
- Ensure read permissions on database files
**Empty Output**
- Check database file integrity
- Verify table name `ENTRY_TABLE_1096292361887` exists
### Legacy Considerations
- HSQLDB 1.8.0.10 is from 2004 - very old format
- Modern HSQLDB versions may not read these files
- The "Bridge" approach isolates legacy dependencies
## File Structure
```
thingamablog-api/
├── ExportTool.java # Main extraction tool
├── pom.xml # Maven configuration
├── src/main/java/... # Additional Spring Boot components (unused)
├── target/dependency/ # Maven dependencies
└── .gitignore # Excludes build artifacts
```
## Development Notes
- Originally attempted with Spring Boot and newer HSQLDB drivers
- Simplified to standalone Java CLI for reliability
- Hardcoded paths for single-purpose extraction
- JSON escaping implemented for HTML content safety
## Related Projects
- **thingamablog-v2**: Web application that consumes the exported JSON
- **docs/thingamablog-extract**: Alternative extraction results (Markdown format)
## Future Improvements
- Parameterize database path and output file
- Add command-line arguments for flexibility
- Support for other HSQLDB table schemas
- Integration with modern database migration tools