dasemu/Transcriptarr

Fork 0

Files

Dasemu ac7241b1b9 fix: rename TranscriptorIO to Transcriptarr

2026-01-16 17:07:04 +01:00

18 KiB

Raw Blame History

Transcriptarr Backend Architecture

Technical documentation of the backend architecture, components, and data flow.

Overview
Directory Structure
Core Components
Data Flow
Database Schema
Transcription vs Translation
Worker Architecture
Queue System
Scanner System
Settings System
Graceful Degradation
Thread Safety
Important Patterns

Overview

Transcriptarr is built with a modular architecture consisting of:

FastAPI Server: REST API with 45+ endpoints
Worker Pool: Multiprocessing-based transcription workers (CPU/GPU)
Queue Manager: Persistent job queue with priority support
Library Scanner: Rule-based file scanning with scheduler and watcher
Settings Service: Database-backed configuration system

                    ┌─────────────────────────────────────────────────────────┐
                    │                   FastAPI Server                         │
                    │  ┌─────────────────────────────────────────────────┐   │
                    │  │              REST API (45+ endpoints)            │   │
                    │  │  /api/workers  | /api/jobs     | /api/settings  │   │
                    │  │  /api/scanner  | /api/system   | /api/setup     │   │
                    │  └─────────────────────────────────────────────────┘   │
                    └──────────────────┬──────────────────────────────────────┘
                                       │
                        ┌──────────────┼──────────────┬──────────────────┐
                        │              │              │                  │
                        ▼              ▼              ▼                  ▼
                   ┌────────┐   ┌──────────┐   ┌─────────┐      ┌──────────┐
                   │ Worker │   │  Queue   │   │ Scanner │      │ Database │
                   │  Pool  │◄──┤ Manager  │◄──┤ Engine  │      │ SQLite/  │
                   │ CPU/GPU│   │ Priority │   │ Rules + │      │ Postgres │
                   └────────┘   │  Queue   │   │ Watcher │      └──────────┘
                                └──────────┘   └─────────┘

Directory Structure

backend/
├── app.py                      # FastAPI application + lifespan
├── cli.py                      # CLI commands (server, db, worker, scan, setup)
├── config.py                   # Pydantic Settings (from .env)
├── setup_wizard.py             # Interactive first-run setup
│
├── core/
│   ├── database.py             # SQLAlchemy setup + session management
│   ├── models.py               # Job model + enums
│   ├── language_code.py        # ISO 639 language code utilities
│   ├── settings_model.py       # SystemSettings model (database-backed)
│   ├── settings_service.py     # Settings service with caching
│   ├── system_monitor.py       # CPU/RAM/GPU/VRAM monitoring
│   ├── queue_manager.py        # Persistent queue with priority
│   ├── worker.py               # Individual worker (Process)
│   └── worker_pool.py          # Worker pool orchestrator
│
├── transcription/
│   ├── __init__.py             # Exports + WHISPER_AVAILABLE flag
│   ├── transcriber.py          # WhisperTranscriber wrapper
│   ├── translator.py           # Google Translate integration
│   └── audio_utils.py          # ffmpeg/ffprobe utilities
│
├── scanning/
│   ├── __init__.py             # Exports (NO library_scanner import!)
│   ├── models.py               # ScanRule model
│   ├── file_analyzer.py        # ffprobe file analysis
│   ├── language_detector.py    # Audio language detection
│   ├── detected_languages.py   # Language mappings
│   └── library_scanner.py      # Scanner + scheduler + watcher
│
└── api/
    ├── __init__.py             # Router exports
    ├── workers.py              # Worker management endpoints
    ├── jobs.py                 # Job queue endpoints
    ├── scan_rules.py           # Scan rules CRUD
    ├── scanner.py              # Scanner control endpoints
    ├── settings.py             # Settings CRUD endpoints
    ├── system.py               # System resources endpoints
    ├── filesystem.py           # Filesystem browser endpoints
    └── setup_wizard.py         # Setup wizard endpoints

Core Components

1. WorkerPool (`core/worker_pool.py`)

Orchestrates CPU/GPU workers as separate processes.

Key Features:

Dynamic add/remove workers at runtime
Health monitoring with auto-restart
Thread-safe multiprocessing
Each worker is an isolated Process

from backend.core.worker_pool import worker_pool
from backend.core.worker import WorkerType

# Add GPU worker on device 0
worker_id = worker_pool.add_worker(WorkerType.GPU, device_id=0)

# Add CPU worker
worker_id = worker_pool.add_worker(WorkerType.CPU)

# Get pool stats
stats = worker_pool.get_pool_stats()

2. QueueManager (`core/queue_manager.py`)

Persistent SQLite/PostgreSQL queue with priority support.

Key Features:

Job deduplication (no duplicate file_path)
Row-level locking with skip_locked=True
Priority-based ordering (higher first)
FIFO within same priority (by created_at)
Auto-retry failed jobs

from backend.core.queue_manager import queue_manager
from backend.core.models import QualityPreset

job = queue_manager.add_job(
    file_path="/media/anime.mkv",
    file_name="anime.mkv",
    source_lang="jpn",
    target_lang="spa",
    quality_preset=QualityPreset.FAST,
    priority=5
)

3. LibraryScanner (`scanning/library_scanner.py`)

Rule-based file scanning system.

Three Scan Modes:

Manual: One-time scan via API or CLI
Scheduled: Periodic scanning with APScheduler
Real-time: File watcher with watchdog library

from backend.scanning.library_scanner import library_scanner

# Manual scan
result = library_scanner.scan_paths(["/media/anime"], recursive=True)

# Start scheduler (every 6 hours)
library_scanner.start_scheduler(interval_minutes=360)

# Start file watcher
library_scanner.start_file_watcher(paths=["/media/anime"], recursive=True)

4. WhisperTranscriber (`transcription/transcriber.py`)

Wrapper for stable-whisper and faster-whisper.

Key Features:

GPU/CPU support with auto-device detection
VRAM management and cleanup
Graceful degradation (works without Whisper installed)

from backend.transcription.transcriber import WhisperTranscriber

transcriber = WhisperTranscriber(
    model_name="large-v3",
    device="cuda",
    compute_type="float16"
)

result = transcriber.transcribe_file(
    file_path="/media/episode.mkv",
    language="jpn",
    task="translate"  # translate to English
)

result.to_srt("episode.eng.srt")

5. SettingsService (`core/settings_service.py`)

Database-backed configuration with caching.

from backend.core.settings_service import settings_service

# Get setting
value = settings_service.get("worker_cpu_count", default=1)

# Set setting
settings_service.set("worker_cpu_count", "2")

# Bulk update
settings_service.bulk_update({
    "worker_cpu_count": "2",
    "scanner_enabled": "true"
})

Data Flow

1. LibraryScanner detects file (manual/scheduled/watcher)
   ↓
2. FileAnalyzer analyzes with ffprobe
   - Audio tracks (codec, language, channels)
   - Embedded subtitles
   - External .srt files
   - Duration, video info
   ↓
3. Rules Engine evaluates against ScanRules (priority order)
   - Checks all conditions (audio language, missing subs, etc.)
   - First matching rule wins
   ↓
4. If match → QueueManager.add_job()
   - Deduplication check (no duplicate file_path)
   - Assigns priority based on rule
   ↓
5. Worker pulls job from queue
   - Uses with_for_update(skip_locked=True)
   - FIFO within same priority
   ↓
6. WhisperTranscriber processes with model
   - Stage 1: Audio → English (Whisper translate)
   - Stage 2: English → Target (Google Translate, if needed)
   ↓
7. Generate output SRT file(s)
   - .eng.srt (always)
   - .{target}.srt (if translate mode)
   ↓
8. Job marked completed ✓

Database Schema

Job Table (`jobs`)

id                      VARCHAR PRIMARY KEY
file_path               VARCHAR UNIQUE      -- Ensures no duplicates
file_name               VARCHAR
status                  VARCHAR             -- queued/processing/completed/failed/cancelled
priority                INTEGER
source_lang             VARCHAR
target_lang             VARCHAR
quality_preset          VARCHAR             -- fast/balanced/best
transcribe_or_translate VARCHAR             -- transcribe/translate
progress                FLOAT
current_stage           VARCHAR
eta_seconds             INTEGER
created_at              DATETIME
started_at              DATETIME
completed_at            DATETIME
output_path             VARCHAR
srt_content             TEXT
segments_count          INTEGER
error                   TEXT
retry_count             INTEGER
max_retries             INTEGER
worker_id               VARCHAR
vram_used_mb            INTEGER
processing_time_seconds FLOAT

ScanRule Table (`scan_rules`)

id                          INTEGER PRIMARY KEY
name                        VARCHAR UNIQUE
enabled                     BOOLEAN
priority                    INTEGER         -- Higher = evaluated first

-- Conditions (all must match):
audio_language_is           VARCHAR         -- ISO 639-2
audio_language_not          VARCHAR         -- Comma-separated
audio_track_count_min       INTEGER
has_embedded_subtitle_lang  VARCHAR
missing_embedded_subtitle_lang VARCHAR
missing_external_subtitle_lang VARCHAR
file_extension              VARCHAR         -- Comma-separated

-- Action:
action_type                 VARCHAR         -- transcribe/translate
target_language             VARCHAR
quality_preset              VARCHAR
job_priority                INTEGER

created_at                  DATETIME
updated_at                  DATETIME

SystemSettings Table (`system_settings`)

id          INTEGER PRIMARY KEY
key         VARCHAR UNIQUE
value       TEXT
description TEXT
category    VARCHAR             -- general/workers/transcription/scanner/bazarr
value_type  VARCHAR             -- string/integer/boolean/list
created_at  DATETIME
updated_at  DATETIME

Transcription vs Translation

Understanding the Two Modes

Mode 1: transcribe (Audio → English subtitles)

Audio (any language) → Whisper (task='translate') → English SRT
Example: Japanese audio → anime.eng.srt

Mode 2: translate (Audio → English → Target language)

Audio (any language) → Whisper (task='translate') → English SRT
                    → Google Translate → Target language SRT
Example: Japanese audio → anime.eng.srt + anime.spa.srt

Why Two Stages?

Whisper Limitation: Whisper can only translate TO English, not between other languages.

Solution: Two-stage process:

Stage 1 (Always): Whisper converts audio to English using task='translate'
Stage 2 (Only for translate mode): Google Translate converts English to target language

Output Files

Mode	Target	Output Files
transcribe	spa	`.eng.srt` only
translate	spa	`.eng.srt` + `.spa.srt`
translate	fra	`.eng.srt` + `.fra.srt`

Worker Architecture

Worker Types

Type	Description	Device
CPU	Uses CPU for inference	None
GPU	Uses NVIDIA GPU	cuda:N

Worker Lifecycle

                    ┌─────────────┐
                    │   CREATED   │
                    └──────┬──────┘
                           │ start()
                           ▼
                    ┌─────────────┐
        ┌──────────│    IDLE     │◄─────────┐
        │          └──────┬──────┘          │
        │                 │ get_job()       │ job_done()
        │                 ▼                 │
        │          ┌─────────────┐          │
        │          │    BUSY     │──────────┘
        │          └──────┬──────┘
        │                 │ error
        │                 ▼
        │          ┌─────────────┐
        └──────────│    ERROR    │
                   └─────────────┘

Process Isolation

Each worker runs in a separate Python process:

Memory isolation (VRAM per GPU worker)
Crash isolation (one worker crash doesn't affect others)
Independent model loading

Queue System

Priority System

# Priority values
BAZARR_REQUEST = base_priority + 10    # Highest (external request)
MANUAL_REQUEST = base_priority + 5     # High (user-initiated)
AUTO_SCAN      = base_priority         # Normal (scanner-generated)

Job Deduplication

Jobs are deduplicated by file_path:

If job exists with same file_path, new job is rejected
Returns None from add_job()
Prevents duplicate processing

Concurrency Safety

# Row-level locking prevents race conditions
job = session.query(Job).filter(
    Job.status == JobStatus.QUEUED
).with_for_update(skip_locked=True).first()

Scanner System

Scan Rule Evaluation

Rules are evaluated in priority order (highest first):

# Pseudo-code for rule matching
for rule in rules.order_by(priority.desc()):
    if rule.enabled and matches_all_conditions(file, rule):
        create_job(file, rule.action)
        break  # First match wins

Conditions

All conditions must match (AND logic):

Condition	Match If
audio_language_is	Primary audio track language equals
audio_language_not	Primary audio track language NOT in list
audio_track_count_min	Number of audio tracks >= value
has_embedded_subtitle_lang	Has embedded subtitle in language
missing_embedded_subtitle_lang	Does NOT have embedded subtitle
missing_external_subtitle_lang	Does NOT have external .srt file
file_extension	File extension in comma-separated list

Settings System

Category	Settings
general	operation_mode, library_paths, log_level
workers	cpu_count, gpu_count, auto_start, healthcheck_interval
transcription	whisper_model, compute_type, vram_management
scanner	enabled, schedule_interval, watcher_enabled
bazarr	provider_enabled, api_key

Caching

Settings service implements caching:

Cache invalidated on write
Thread-safe access
Lazy loading from database

Graceful Degradation

The system can run WITHOUT Whisper/torch/PyAV installed:

# Pattern used everywhere
try:
    import stable_whisper
    WHISPER_AVAILABLE = True
except ImportError:
    stable_whisper = None
    WHISPER_AVAILABLE = False

# Later in code
if not WHISPER_AVAILABLE:
    raise RuntimeError("Install with: pip install stable-ts faster-whisper")

What works without Whisper:

Backend server starts normally
All APIs work fully
Frontend development
Scanner and rules management
Job queue (jobs just won't be processed)

What doesn't work:

Actual transcription (throws RuntimeError)

Thread Safety

Database Sessions

Always use context managers:

with database.get_session() as session:
    # Session is automatically committed on success
    # Rolled back on exception
    job = session.query(Job).filter(...).first()

Worker Pool

Each worker is a separate Process (multiprocessing)
Communication via shared memory (Manager)
No GIL contention between workers

Queue Manager

Uses SQLAlchemy row locking
skip_locked=True prevents deadlocks
Transactions are short-lived

Important Patterns

Circular Import Resolution

Critical: backend/scanning/__init__.py MUST NOT import library_scanner:

# backend/scanning/__init__.py
from backend.scanning.models import ScanRule
from backend.scanning.file_analyzer import FileAnalyzer, FileAnalysis
# DO NOT import library_scanner here!

Why?

library_scanner → database → models → scanning.models → database (circular!)

Solution: Import library_scanner locally where needed:

def some_function():
    from backend.scanning.library_scanner import library_scanner
    library_scanner.scan_paths(...)

Optional Imports

try:
    import pynvml
    NVML_AVAILABLE = True
except ImportError:
    pynvml = None
    NVML_AVAILABLE = False

Database Session Pattern

from backend.core.database import database

with database.get_session() as session:
    # All operations within session context
    job = session.query(Job).filter(...).first()
    job.status = JobStatus.PROCESSING
    # Commit happens automatically

API Response Pattern

from pydantic import BaseModel

class JobResponse(BaseModel):
    id: str
    status: str
    # ...

@router.get("/{job_id}", response_model=JobResponse)
async def get_job(job_id: str):
    with database.get_session() as session:
        job = session.query(Job).filter(Job.id == job_id).first()
        if not job:
            raise HTTPException(status_code=404, detail="Not found")
        return JobResponse(**job.to_dict())

18 KiB Raw Blame History

Transcriptarr Backend Architecture

Table of Contents

Overview

Directory Structure

Core Components

1. WorkerPool (core/worker_pool.py)

2. QueueManager (core/queue_manager.py)

3. LibraryScanner (scanning/library_scanner.py)

4. WhisperTranscriber (transcription/transcriber.py)

5. SettingsService (core/settings_service.py)

Data Flow

Database Schema

Job Table (jobs)

ScanRule Table (scan_rules)

SystemSettings Table (system_settings)

Transcription vs Translation

Understanding the Two Modes

Why Two Stages?

Output Files

Worker Architecture

Worker Types

Worker Lifecycle

Process Isolation

Queue System

Priority System

Job Deduplication

Concurrency Safety

Scanner System

Scan Rule Evaluation

Conditions

Settings System

Categories

Caching

Graceful Degradation

Thread Safety

Database Sessions

Worker Pool

Queue Manager

Important Patterns

Circular Import Resolution

Optional Imports

Database Session Pattern

API Response Pattern

18 KiB

Raw Blame History

1. WorkerPool (`core/worker_pool.py`)

2. QueueManager (`core/queue_manager.py`)

3. LibraryScanner (`scanning/library_scanner.py`)

4. WhisperTranscriber (`transcription/transcriber.py`)

5. SettingsService (`core/settings_service.py`)

Job Table (`jobs`)

ScanRule Table (`scan_rules`)

SystemSettings Table (`system_settings`)