Files

Dasemu 4e9e3c4159 docs: add comprehensive documentation and test suite

- Add CLAUDE.md with project architecture and operation modes
- Add backend/README.md with setup and usage instructions
- Add test_backend.py with automated tests for config, database, and queue
- Update requirements.txt with optional dependencies structure
- Update .env.example with all configuration options

2026-01-11 21:37:20 +01:00

api

feat: add centralized configuration system with Pydantic

2026-01-11 21:23:45 +01:00

core

feat: add centralized configuration system with Pydantic

2026-01-11 21:23:45 +01:00

__init__.py

feat: add centralized configuration system with Pydantic

2026-01-11 21:23:45 +01:00

config.py

feat: add centralized configuration system with Pydantic

2026-01-11 21:23:45 +01:00

README.md

docs: add comprehensive documentation and test suite

2026-01-11 21:37:20 +01:00

README.md

TranscriptorIO Backend

This is the redesigned backend for TranscriptorIO, a complete fork of SubGen with modern asynchronous architecture.

🎯 Goal

Replace SubGen's synchronous non-persistent system with a modern Tdarr-inspired architecture:

✅ Persistent queue (SQLite/PostgreSQL/MariaDB)
✅ Asynchronous processing
✅ Job prioritization
✅ Complete state visibility
✅ No Bazarr timeouts

📁 Structure

backend/
├── core/
│   ├── database.py       # Multi-backend database management
│   ├── models.py         # SQLAlchemy models (Job, etc.)
│   ├── queue_manager.py  # Asynchronous persistent queue
│   └── __init__.py
├── api/                  # (coming soon) FastAPI endpoints
├── config.py            # Centralized configuration with Pydantic
└── README.md            # This file

🚀 Setup

1. Install dependencies

pip install -r requirements.txt

2. Configure .env

Copy .env.example to .env and adjust as needed:

cp .env.example .env

Database Options

SQLite (default):

DATABASE_URL=sqlite:///./transcriptarr.db

PostgreSQL:

pip install psycopg2-binary

DATABASE_URL=postgresql://user:password@localhost:5432/transcriptarr

MariaDB/MySQL:

pip install pymysql

DATABASE_URL=mariadb+pymysql://user:password@localhost:3306/transcriptarr

3. Choose operation mode

Standalone Mode (automatically scans your library):

TRANSCRIPTARR_MODE=standalone
LIBRARY_PATHS=/media/anime|/media/movies
AUTO_SCAN_ENABLED=True
SCAN_INTERVAL_MINUTES=30

Provider Mode (receives jobs from Bazarr):

TRANSCRIPTARR_MODE=provider
BAZARR_URL=http://bazarr:6767
BAZARR_API_KEY=your_api_key

Hybrid Mode (both simultaneously):

TRANSCRIPTARR_MODE=standalone,provider

🧪 Testing

Run the test script to verify everything works:

python test_backend.py

This will verify:

✓ Configuration loading
✓ Database connection
✓ Table creation
✓ Queue operations (add, get, deduplicate)

📊 Implemented Components

config.py

Centralized configuration with Pydantic
Automatic environment variable validation
Multi-backend database support
Operation mode configuration

database.py

Connection management with SQLAlchemy
Support for SQLite, PostgreSQL, MariaDB
Backend-specific optimizations
- SQLite: WAL mode, optimized cache
- PostgreSQL: connection pooling, pre-ping
- MariaDB: utf8mb4 charset, pooling
Health checks and statistics

models.py

Complete Job model with:
- States: queued, processing, completed, failed, cancelled
- Stages: pending, detecting_language, transcribing, translating, etc.
- Quality presets: fast, balanced, best
- Progress tracking (0-100%)
- Complete timestamps
- Retry logic
- Worker assignment
Optimized indexes for common queries

queue_manager.py

Thread-safe persistent queue
Job prioritization
Duplicate detection
Automatic retry for failed jobs
Real-time statistics
Automatic cleanup of old jobs

🔄 Comparison with SubGen

Feature	SubGen	TranscriptorIO
Queue	In-memory (lost on restart)	Persistent in DB
Processing	Synchronous (blocks threads)	Asynchronous
Prioritization	No	Yes (configurable)
Visibility	No progress/ETA	Progress + real-time ETA
Deduplication	Basic (memory only)	Persistent + intelligent
Retries	No	Automatic with limit
Database	No	SQLite/PostgreSQL/MariaDB
Bazarr Timeouts	Yes (>5min = 24h throttle)	No (async)

📝 Next Steps

Worker Pool - Asynchronous worker system
REST API - FastAPI endpoints for management
WebSocket - Real-time updates
Transcriber - Whisper wrapper with progress callbacks
Bazarr Provider - Improved async provider
Standalone Scanner - Automatic library scanning

🐛 Troubleshooting

Error: "No module named 'backend'"

Make sure to run scripts from the project root:

cd /home/dasemu/Hacking/Transcriptarr
python test_backend.py

Error: Database locked (SQLite)

SQLite is configured with WAL mode for better concurrency. If you still have issues, consider using PostgreSQL for production.

Error: pydantic.errors.ConfigError

Verify that all required variables are in your .env:

cp .env.example .env
# Edit .env with your values

📚 Documentation

See CLAUDE.md for complete architecture and project roadmap.