- Add CLAUDE.md with project architecture and operation modes - Add backend/README.md with setup and usage instructions - Add test_backend.py with automated tests for config, database, and queue - Update requirements.txt with optional dependencies structure - Update .env.example with all configuration options
4.6 KiB
TranscriptorIO Backend
This is the redesigned backend for TranscriptorIO, a complete fork of SubGen with modern asynchronous architecture.
🎯 Goal
Replace SubGen's synchronous non-persistent system with a modern Tdarr-inspired architecture:
- ✅ Persistent queue (SQLite/PostgreSQL/MariaDB)
- ✅ Asynchronous processing
- ✅ Job prioritization
- ✅ Complete state visibility
- ✅ No Bazarr timeouts
📁 Structure
backend/
├── core/
│ ├── database.py # Multi-backend database management
│ ├── models.py # SQLAlchemy models (Job, etc.)
│ ├── queue_manager.py # Asynchronous persistent queue
│ └── __init__.py
├── api/ # (coming soon) FastAPI endpoints
├── config.py # Centralized configuration with Pydantic
└── README.md # This file
🚀 Setup
1. Install dependencies
pip install -r requirements.txt
2. Configure .env
Copy .env.example to .env and adjust as needed:
cp .env.example .env
Database Options
SQLite (default):
DATABASE_URL=sqlite:///./transcriptarr.db
PostgreSQL:
pip install psycopg2-binary
DATABASE_URL=postgresql://user:password@localhost:5432/transcriptarr
MariaDB/MySQL:
pip install pymysql
DATABASE_URL=mariadb+pymysql://user:password@localhost:3306/transcriptarr
3. Choose operation mode
Standalone Mode (automatically scans your library):
TRANSCRIPTARR_MODE=standalone
LIBRARY_PATHS=/media/anime|/media/movies
AUTO_SCAN_ENABLED=True
SCAN_INTERVAL_MINUTES=30
Provider Mode (receives jobs from Bazarr):
TRANSCRIPTARR_MODE=provider
BAZARR_URL=http://bazarr:6767
BAZARR_API_KEY=your_api_key
Hybrid Mode (both simultaneously):
TRANSCRIPTARR_MODE=standalone,provider
🧪 Testing
Run the test script to verify everything works:
python test_backend.py
This will verify:
- ✓ Configuration loading
- ✓ Database connection
- ✓ Table creation
- ✓ Queue operations (add, get, deduplicate)
📊 Implemented Components
config.py
- Centralized configuration with Pydantic
- Automatic environment variable validation
- Multi-backend database support
- Operation mode configuration
database.py
- Connection management with SQLAlchemy
- Support for SQLite, PostgreSQL, MariaDB
- Backend-specific optimizations
- SQLite: WAL mode, optimized cache
- PostgreSQL: connection pooling, pre-ping
- MariaDB: utf8mb4 charset, pooling
- Health checks and statistics
models.py
- Complete
Jobmodel with:- States: queued, processing, completed, failed, cancelled
- Stages: pending, detecting_language, transcribing, translating, etc.
- Quality presets: fast, balanced, best
- Progress tracking (0-100%)
- Complete timestamps
- Retry logic
- Worker assignment
- Optimized indexes for common queries
queue_manager.py
- Thread-safe persistent queue
- Job prioritization
- Duplicate detection
- Automatic retry for failed jobs
- Real-time statistics
- Automatic cleanup of old jobs
🔄 Comparison with SubGen
| Feature | SubGen | TranscriptorIO |
|---|---|---|
| Queue | In-memory (lost on restart) | Persistent in DB |
| Processing | Synchronous (blocks threads) | Asynchronous |
| Prioritization | No | Yes (configurable) |
| Visibility | No progress/ETA | Progress + real-time ETA |
| Deduplication | Basic (memory only) | Persistent + intelligent |
| Retries | No | Automatic with limit |
| Database | No | SQLite/PostgreSQL/MariaDB |
| Bazarr Timeouts | Yes (>5min = 24h throttle) | No (async) |
📝 Next Steps
- Worker Pool - Asynchronous worker system
- REST API - FastAPI endpoints for management
- WebSocket - Real-time updates
- Transcriber - Whisper wrapper with progress callbacks
- Bazarr Provider - Improved async provider
- Standalone Scanner - Automatic library scanning
🐛 Troubleshooting
Error: "No module named 'backend'"
Make sure to run scripts from the project root:
cd /home/dasemu/Hacking/Transcriptarr
python test_backend.py
Error: Database locked (SQLite)
SQLite is configured with WAL mode for better concurrency. If you still have issues, consider using PostgreSQL for production.
Error: pydantic.errors.ConfigError
Verify that all required variables are in your .env:
cp .env.example .env
# Edit .env with your values
📚 Documentation
See CLAUDE.md for complete architecture and project roadmap.