Files
Transcriptarr/backend
Dasemu 4e9e3c4159 docs: add comprehensive documentation and test suite
- Add CLAUDE.md with project architecture and operation modes
- Add backend/README.md with setup and usage instructions
- Add test_backend.py with automated tests for config, database, and queue
- Update requirements.txt with optional dependencies structure
- Update .env.example with all configuration options
2026-01-11 21:37:20 +01:00
..

TranscriptorIO Backend

This is the redesigned backend for TranscriptorIO, a complete fork of SubGen with modern asynchronous architecture.

🎯 Goal

Replace SubGen's synchronous non-persistent system with a modern Tdarr-inspired architecture:

  • Persistent queue (SQLite/PostgreSQL/MariaDB)
  • Asynchronous processing
  • Job prioritization
  • Complete state visibility
  • No Bazarr timeouts

📁 Structure

backend/
├── core/
│   ├── database.py       # Multi-backend database management
│   ├── models.py         # SQLAlchemy models (Job, etc.)
│   ├── queue_manager.py  # Asynchronous persistent queue
│   └── __init__.py
├── api/                  # (coming soon) FastAPI endpoints
├── config.py            # Centralized configuration with Pydantic
└── README.md            # This file

🚀 Setup

1. Install dependencies

pip install -r requirements.txt

2. Configure .env

Copy .env.example to .env and adjust as needed:

cp .env.example .env

Database Options

SQLite (default):

DATABASE_URL=sqlite:///./transcriptarr.db

PostgreSQL:

pip install psycopg2-binary
DATABASE_URL=postgresql://user:password@localhost:5432/transcriptarr

MariaDB/MySQL:

pip install pymysql
DATABASE_URL=mariadb+pymysql://user:password@localhost:3306/transcriptarr

3. Choose operation mode

Standalone Mode (automatically scans your library):

TRANSCRIPTARR_MODE=standalone
LIBRARY_PATHS=/media/anime|/media/movies
AUTO_SCAN_ENABLED=True
SCAN_INTERVAL_MINUTES=30

Provider Mode (receives jobs from Bazarr):

TRANSCRIPTARR_MODE=provider
BAZARR_URL=http://bazarr:6767
BAZARR_API_KEY=your_api_key

Hybrid Mode (both simultaneously):

TRANSCRIPTARR_MODE=standalone,provider

🧪 Testing

Run the test script to verify everything works:

python test_backend.py

This will verify:

  • ✓ Configuration loading
  • ✓ Database connection
  • ✓ Table creation
  • ✓ Queue operations (add, get, deduplicate)

📊 Implemented Components

config.py

  • Centralized configuration with Pydantic
  • Automatic environment variable validation
  • Multi-backend database support
  • Operation mode configuration

database.py

  • Connection management with SQLAlchemy
  • Support for SQLite, PostgreSQL, MariaDB
  • Backend-specific optimizations
    • SQLite: WAL mode, optimized cache
    • PostgreSQL: connection pooling, pre-ping
    • MariaDB: utf8mb4 charset, pooling
  • Health checks and statistics

models.py

  • Complete Job model with:
    • States: queued, processing, completed, failed, cancelled
    • Stages: pending, detecting_language, transcribing, translating, etc.
    • Quality presets: fast, balanced, best
    • Progress tracking (0-100%)
    • Complete timestamps
    • Retry logic
    • Worker assignment
  • Optimized indexes for common queries

queue_manager.py

  • Thread-safe persistent queue
  • Job prioritization
  • Duplicate detection
  • Automatic retry for failed jobs
  • Real-time statistics
  • Automatic cleanup of old jobs

🔄 Comparison with SubGen

Feature SubGen TranscriptorIO
Queue In-memory (lost on restart) Persistent in DB
Processing Synchronous (blocks threads) Asynchronous
Prioritization No Yes (configurable)
Visibility No progress/ETA Progress + real-time ETA
Deduplication Basic (memory only) Persistent + intelligent
Retries No Automatic with limit
Database No SQLite/PostgreSQL/MariaDB
Bazarr Timeouts Yes (>5min = 24h throttle) No (async)

📝 Next Steps

  1. Worker Pool - Asynchronous worker system
  2. REST API - FastAPI endpoints for management
  3. WebSocket - Real-time updates
  4. Transcriber - Whisper wrapper with progress callbacks
  5. Bazarr Provider - Improved async provider
  6. Standalone Scanner - Automatic library scanning

🐛 Troubleshooting

Error: "No module named 'backend'"

Make sure to run scripts from the project root:

cd /home/dasemu/Hacking/Transcriptarr
python test_backend.py

Error: Database locked (SQLite)

SQLite is configured with WAL mode for better concurrency. If you still have issues, consider using PostgreSQL for production.

Error: pydantic.errors.ConfigError

Verify that all required variables are in your .env:

cp .env.example .env
# Edit .env with your values

📚 Documentation

See CLAUDE.md for complete architecture and project roadmap.