thingamablog-api/README.md

# Thingamablog API - Data Extraction Tool

## Overview

This is the data extraction component of the Thingamablog migration project. It contains a Java CLI tool that bridges legacy HSQLDB database files to modern JSON format, enabling the web application to serve clean, structured blog data.

## Purpose

The Thingamablog platform (early 2000s) stored blog posts in an obsolete HSQLDB database format. This tool extracts that data into a clean JSON format that can be consumed by modern applications.

## Architecture

- **Input:** HSQLDB database files (`database.script`, `database.data`)
- **Tool:** `ExportTool.java` - JDBC-based Java application
- **Driver:** HSQLDB 1.8.0.10 JAR (legacy compatible)
- **Output:** `blog-export.json` - Structured JSON array of blog posts

## Setup & Build

### Prerequisites
- Java 8 or higher
- Maven 3.x (for dependency management)

### Dependencies
- HSQLDB 1.8.0.10 JAR (automatically downloaded by Maven)
- Maven coordinates: `org.hsqldb:hsqldb:1.8.0.10`

### Build Process
```bash
# Download dependencies
mvn dependency:copy-dependencies

# Compile the tool
javac -cp target/dependency/hsqldb-1.8.0.10.jar ExportTool.java

# The compiled class will be in the root directory
```

## Usage

### Command Line
```bash
java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > ../thingamablog-v2/backend/blog-export.json
```

### What It Does
1. Connects to HSQLDB database at `/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/database/`
2. Queries the `ENTRY_TABLE_1096292361887` table
3. Maps database columns to JSON fields:
   - `ID` → `id`
   - `TITLE` → `title`
   - `TIMESTAMP` → `date`
   - `ENTRY` → `content`
   - `CATEGORIES` → `categories`
   - `AUTHOR` → `author`
4. Outputs clean JSON array to stdout

### Sample Output
```json
[
  {
    "id": 1,
    "title": "Digital Imaging Notes",
    "date": "2003-11-03 16:41:22.053",
    "author": "Paul",
    "categories": "Hobbies",
    "content": "<p>Full HTML content preserved...</p>"
  }
]
```

## Data Quality

The export produces high-quality data:
- ✅ Perfect titles and dates
- ✅ Full HTML content preserved
- ✅ Categories properly extracted
- ✅ Sequential IDs assigned
- ✅ JSON validation passes
- ✅ 467 entries successfully extracted (1.3MB)

## Integration

The exported JSON feeds directly into the thingamablog-v2 web application:

1. Place `blog-export.json` in `../thingamablog-v2/backend/`
2. The Node.js backend prioritizes this clean JSON over the fallback HSQLDB parser
3. Web app serves posts via REST API

## Troubleshooting

### Common Issues

**JDBC Driver Not Found**
```
Error: org.hsqldb.jdbcDriver
```
- Ensure Maven has downloaded the dependency: `mvn dependency:copy-dependencies`
- Check classpath includes `target/dependency/hsqldb-1.8.0.10.jar`

**Database Path Issues**
```
SQL Exception: file not found
```
- Verify HSQLDB files exist at the hardcoded path
- Ensure read permissions on database files

**Empty Output**
- Check database file integrity
- Verify table name `ENTRY_TABLE_1096292361887` exists

### Legacy Considerations

- HSQLDB 1.8.0.10 is from 2004 - very old format
- Modern HSQLDB versions may not read these files
- The "Bridge" approach isolates legacy dependencies

## File Structure

```
thingamablog-api/
├── ExportTool.java          # Main extraction tool
├── pom.xml                  # Maven configuration
├── src/main/java/...        # Additional Spring Boot components (unused)
├── target/dependency/       # Maven dependencies
└── .gitignore               # Excludes build artifacts
```

## Development Notes

- Originally attempted with Spring Boot and newer HSQLDB drivers
- Simplified to standalone Java CLI for reliability
- Hardcoded paths for single-purpose extraction
- JSON escaping implemented for HTML content safety

## Related Projects

- **thingamablog-v2**: Web application that consumes the exported JSON
- **docs/thingamablog-extract**: Alternative extraction results (Markdown format)

## Future Improvements

- Parameterize database path and output file
- Add command-line arguments for flexibility
- Support for other HSQLDB table schemas
- Integration with modern database migration tools