146 lines
4.1 KiB
Markdown
146 lines
4.1 KiB
Markdown
# Thingamablog API - Data Extraction Tool
|
|
|
|
## Overview
|
|
|
|
This is the data extraction component of the Thingamablog migration project. It contains a Java CLI tool that bridges legacy HSQLDB database files to modern JSON format, enabling the web application to serve clean, structured blog data.
|
|
|
|
## Purpose
|
|
|
|
The Thingamablog platform (early 2000s) stored blog posts in an obsolete HSQLDB database format. This tool extracts that data into a clean JSON format that can be consumed by modern applications.
|
|
|
|
## Architecture
|
|
|
|
- **Input:** HSQLDB database files (`database.script`, `database.data`)
|
|
- **Tool:** `ExportTool.java` - JDBC-based Java application
|
|
- **Driver:** HSQLDB 1.8.0.10 JAR (legacy compatible)
|
|
- **Output:** `blog-export.json` - Structured JSON array of blog posts
|
|
|
|
## Setup & Build
|
|
|
|
### Prerequisites
|
|
- Java 8 or higher
|
|
- Maven 3.x (for dependency management)
|
|
|
|
### Dependencies
|
|
- HSQLDB 1.8.0.10 JAR (automatically downloaded by Maven)
|
|
- Maven coordinates: `org.hsqldb:hsqldb:1.8.0.10`
|
|
|
|
### Build Process
|
|
```bash
|
|
# Download dependencies
|
|
mvn dependency:copy-dependencies
|
|
|
|
# Compile the tool
|
|
javac -cp target/dependency/hsqldb-1.8.0.10.jar ExportTool.java
|
|
|
|
# The compiled class will be in the root directory
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Command Line
|
|
```bash
|
|
java -cp .:target/dependency/hsqldb-1.8.0.10.jar ExportTool > ../thingamablog-v2/backend/blog-export.json
|
|
```
|
|
|
|
### What It Does
|
|
1. Connects to HSQLDB database at `/home/paulh/.openclaw/workspace/docs/pauls-blogs/Paul/database/`
|
|
2. Queries the `ENTRY_TABLE_1096292361887` table
|
|
3. Maps database columns to JSON fields:
|
|
- `ID` → `id`
|
|
- `TITLE` → `title`
|
|
- `TIMESTAMP` → `date`
|
|
- `ENTRY` → `content`
|
|
- `CATEGORIES` → `categories`
|
|
- `AUTHOR` → `author`
|
|
4. Outputs clean JSON array to stdout
|
|
|
|
### Sample Output
|
|
```json
|
|
[
|
|
{
|
|
"id": 1,
|
|
"title": "Digital Imaging Notes",
|
|
"date": "2003-11-03 16:41:22.053",
|
|
"author": "Paul",
|
|
"categories": "Hobbies",
|
|
"content": "<p>Full HTML content preserved...</p>"
|
|
}
|
|
]
|
|
```
|
|
|
|
## Data Quality
|
|
|
|
The export produces high-quality data:
|
|
- ✅ Perfect titles and dates
|
|
- ✅ Full HTML content preserved
|
|
- ✅ Categories properly extracted
|
|
- ✅ Sequential IDs assigned
|
|
- ✅ JSON validation passes
|
|
- ✅ 467 entries successfully extracted (1.3MB)
|
|
|
|
## Integration
|
|
|
|
The exported JSON feeds directly into the thingamablog-v2 web application:
|
|
|
|
1. Place `blog-export.json` in `../thingamablog-v2/backend/`
|
|
2. The Node.js backend prioritizes this clean JSON over the fallback HSQLDB parser
|
|
3. Web app serves posts via REST API
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**JDBC Driver Not Found**
|
|
```
|
|
Error: org.hsqldb.jdbcDriver
|
|
```
|
|
- Ensure Maven has downloaded the dependency: `mvn dependency:copy-dependencies`
|
|
- Check classpath includes `target/dependency/hsqldb-1.8.0.10.jar`
|
|
|
|
**Database Path Issues**
|
|
```
|
|
SQL Exception: file not found
|
|
```
|
|
- Verify HSQLDB files exist at the hardcoded path
|
|
- Ensure read permissions on database files
|
|
|
|
**Empty Output**
|
|
- Check database file integrity
|
|
- Verify table name `ENTRY_TABLE_1096292361887` exists
|
|
|
|
### Legacy Considerations
|
|
|
|
- HSQLDB 1.8.0.10 is from 2004 - very old format
|
|
- Modern HSQLDB versions may not read these files
|
|
- The "Bridge" approach isolates legacy dependencies
|
|
|
|
## File Structure
|
|
|
|
```
|
|
thingamablog-api/
|
|
├── ExportTool.java # Main extraction tool
|
|
├── pom.xml # Maven configuration
|
|
├── src/main/java/... # Additional Spring Boot components (unused)
|
|
├── target/dependency/ # Maven dependencies
|
|
└── .gitignore # Excludes build artifacts
|
|
```
|
|
|
|
## Development Notes
|
|
|
|
- Originally attempted with Spring Boot and newer HSQLDB drivers
|
|
- Simplified to standalone Java CLI for reliability
|
|
- Hardcoded paths for single-purpose extraction
|
|
- JSON escaping implemented for HTML content safety
|
|
|
|
## Related Projects
|
|
|
|
- **thingamablog-v2**: Web application that consumes the exported JSON
|
|
- **docs/thingamablog-extract**: Alternative extraction results (Markdown format)
|
|
|
|
## Future Improvements
|
|
|
|
- Parameterize database path and output file
|
|
- Add command-line arguments for flexibility
|
|
- Support for other HSQLDB table schemas
|
|
- Integration with modern database migration tools |