docs: add comprehensive project documentation

- Replace original Subgen README with TranscriptorIO documentation - Add docs/API.md with 45+ REST endpoint documentation - Add docs/ARCHITECTURE.md with backend component details - Add docs/FRONTEND.md with Vue 3 frontend structure - Add docs/CONFIGURATION.md with settings system documentation - Remove outdated backend/README.md
2026-01-16 15:10:41 +01:00
parent 9655686a50
commit 8373d8765f
6 changed files with 3109 additions and 435 deletions
--- a/docs/API.md
+++ b/docs/API.md
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,613 @@
+# TranscriptorIO Backend Architecture
+
+Technical documentation of the backend architecture, components, and data flow.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Directory Structure](#directory-structure)
+- [Core Components](#core-components)
+- [Data Flow](#data-flow)
+- [Database Schema](#database-schema)
+- [Transcription vs Translation](#transcription-vs-translation)
+- [Worker Architecture](#worker-architecture)
+- [Queue System](#queue-system)
+- [Scanner System](#scanner-system)
+- [Settings System](#settings-system)
+- [Graceful Degradation](#graceful-degradation)
+- [Thread Safety](#thread-safety)
+- [Important Patterns](#important-patterns)
+
+---
+
+## Overview
+
+TranscriptorIO is built with a modular architecture consisting of:
+
+- **FastAPI Server**: REST API with 45+ endpoints
+- **Worker Pool**: Multiprocessing-based transcription workers (CPU/GPU)
+- **Queue Manager**: Persistent job queue with priority support
+- **Library Scanner**: Rule-based file scanning with scheduler and watcher
+- **Settings Service**: Database-backed configuration system
+
+```
+                    ┌─────────────────────────────────────────────────────────┐
+                    │                   FastAPI Server                         │
+                    │  ┌─────────────────────────────────────────────────┐   │
+                    │  │              REST API (45+ endpoints)            │   │
+                    │  │  /api/workers  | /api/jobs     | /api/settings  │   │
+                    │  │  /api/scanner  | /api/system   | /api/setup     │   │
+                    │  └─────────────────────────────────────────────────┘   │
+                    └──────────────────┬──────────────────────────────────────┘
+                                       │
+                        ┌──────────────┼──────────────┬──────────────────┐
+                        │              │              │                  │
+                        ▼              ▼              ▼                  ▼
+                   ┌────────┐   ┌──────────┐   ┌─────────┐      ┌──────────┐
+                   │ Worker │   │  Queue   │   │ Scanner │      │ Database │
+                   │  Pool  │◄──┤ Manager  │◄──┤ Engine  │      │ SQLite/  │
+                   │ CPU/GPU│   │ Priority │   │ Rules + │      │ Postgres │
+                   └────────┘   │  Queue   │   │ Watcher │      └──────────┘
+                                └──────────┘   └─────────┘
+```
+
+---
+
+## Directory Structure
+
+```
+backend/
+├── app.py                      # FastAPI application + lifespan
+├── cli.py                      # CLI commands (server, db, worker, scan, setup)
+├── config.py                   # Pydantic Settings (from .env)
+├── setup_wizard.py             # Interactive first-run setup
+│
+├── core/
+│   ├── database.py             # SQLAlchemy setup + session management
+│   ├── models.py               # Job model + enums
+│   ├── language_code.py        # ISO 639 language code utilities
+│   ├── settings_model.py       # SystemSettings model (database-backed)
+│   ├── settings_service.py     # Settings service with caching
+│   ├── system_monitor.py       # CPU/RAM/GPU/VRAM monitoring
+│   ├── queue_manager.py        # Persistent queue with priority
+│   ├── worker.py               # Individual worker (Process)
+│   └── worker_pool.py          # Worker pool orchestrator
+│
+├── transcription/
+│   ├── __init__.py             # Exports + WHISPER_AVAILABLE flag
+│   ├── transcriber.py          # WhisperTranscriber wrapper
+│   ├── translator.py           # Google Translate integration
+│   └── audio_utils.py          # ffmpeg/ffprobe utilities
+│
+├── scanning/
+│   ├── __init__.py             # Exports (NO library_scanner import!)
+│   ├── models.py               # ScanRule model
+│   ├── file_analyzer.py        # ffprobe file analysis
+│   ├── language_detector.py    # Audio language detection
+│   ├── detected_languages.py   # Language mappings
+│   └── library_scanner.py      # Scanner + scheduler + watcher
+│
+└── api/
+    ├── __init__.py             # Router exports
+    ├── workers.py              # Worker management endpoints
+    ├── jobs.py                 # Job queue endpoints
+    ├── scan_rules.py           # Scan rules CRUD
+    ├── scanner.py              # Scanner control endpoints
+    ├── settings.py             # Settings CRUD endpoints
+    ├── system.py               # System resources endpoints
+    ├── filesystem.py           # Filesystem browser endpoints
+    └── setup_wizard.py         # Setup wizard endpoints
+```
+
+---
+
+## Core Components
+
+### 1. WorkerPool (`core/worker_pool.py`)
+
+Orchestrates CPU/GPU workers as separate processes.
+
+**Key Features:**
+- Dynamic add/remove workers at runtime
+- Health monitoring with auto-restart
+- Thread-safe multiprocessing
+- Each worker is an isolated Process
+
+```python
+from backend.core.worker_pool import worker_pool
+from backend.core.worker import WorkerType
+
+# Add GPU worker on device 0
+worker_id = worker_pool.add_worker(WorkerType.GPU, device_id=0)
+
+# Add CPU worker
+worker_id = worker_pool.add_worker(WorkerType.CPU)
+
+# Get pool stats
+stats = worker_pool.get_pool_stats()
+```
+
+### 2. QueueManager (`core/queue_manager.py`)
+
+Persistent SQLite/PostgreSQL queue with priority support.
+
+**Key Features:**
+- Job deduplication (no duplicate `file_path`)
+- Row-level locking with `skip_locked=True`
+- Priority-based ordering (higher first)
+- FIFO within same priority (by `created_at`)
+- Auto-retry failed jobs
+
+```python
+from backend.core.queue_manager import queue_manager
+from backend.core.models import QualityPreset
+
+job = queue_manager.add_job(
+    file_path="/media/anime.mkv",
+    file_name="anime.mkv",
+    source_lang="jpn",
+    target_lang="spa",
+    quality_preset=QualityPreset.FAST,
+    priority=5
+)
+```
+
+### 3. LibraryScanner (`scanning/library_scanner.py`)
+
+Rule-based file scanning system.
+
+**Three Scan Modes:**
+- **Manual**: One-time scan via API or CLI
+- **Scheduled**: Periodic scanning with APScheduler
+- **Real-time**: File watcher with watchdog library
+
+```python
+from backend.scanning.library_scanner import library_scanner
+
+# Manual scan
+result = library_scanner.scan_paths(["/media/anime"], recursive=True)
+
+# Start scheduler (every 6 hours)
+library_scanner.start_scheduler(interval_minutes=360)
+
+# Start file watcher
+library_scanner.start_file_watcher(paths=["/media/anime"], recursive=True)
+```
+
+### 4. WhisperTranscriber (`transcription/transcriber.py`)
+
+Wrapper for stable-whisper and faster-whisper.
+
+**Key Features:**
+- GPU/CPU support with auto-device detection
+- VRAM management and cleanup
+- Graceful degradation (works without Whisper installed)
+
+```python
+from backend.transcription.transcriber import WhisperTranscriber
+
+transcriber = WhisperTranscriber(
+    model_name="large-v3",
+    device="cuda",
+    compute_type="float16"
+)
+
+result = transcriber.transcribe_file(
+    file_path="/media/episode.mkv",
+    language="jpn",
+    task="translate"  # translate to English
+)
+
+result.to_srt("episode.eng.srt")
+```
+
+### 5. SettingsService (`core/settings_service.py`)
+
+Database-backed configuration with caching.
+
+```python
+from backend.core.settings_service import settings_service
+
+# Get setting
+value = settings_service.get("worker_cpu_count", default=1)
+
+# Set setting
+settings_service.set("worker_cpu_count", "2")
+
+# Bulk update
+settings_service.bulk_update({
+    "worker_cpu_count": "2",
+    "scanner_enabled": "true"
+})
+```
+
+---
+
+## Data Flow
+
+```
+1. LibraryScanner detects file (manual/scheduled/watcher)
+   ↓
+2. FileAnalyzer analyzes with ffprobe
+   - Audio tracks (codec, language, channels)
+   - Embedded subtitles
+   - External .srt files
+   - Duration, video info
+   ↓
+3. Rules Engine evaluates against ScanRules (priority order)
+   - Checks all conditions (audio language, missing subs, etc.)
+   - First matching rule wins
+   ↓
+4. If match → QueueManager.add_job()
+   - Deduplication check (no duplicate file_path)
+   - Assigns priority based on rule
+   ↓
+5. Worker pulls job from queue
+   - Uses with_for_update(skip_locked=True)
+   - FIFO within same priority
+   ↓
+6. WhisperTranscriber processes with model
+   - Stage 1: Audio → English (Whisper translate)
+   - Stage 2: English → Target (Google Translate, if needed)
+   ↓
+7. Generate output SRT file(s)
+   - .eng.srt (always)
+   - .{target}.srt (if translate mode)
+   ↓
+8. Job marked completed ✓
+```
+
+---
+
+## Database Schema
+
+### Job Table (`jobs`)
+
+```sql
+id                      VARCHAR PRIMARY KEY
+file_path               VARCHAR UNIQUE      -- Ensures no duplicates
+file_name               VARCHAR
+status                  VARCHAR             -- queued/processing/completed/failed/cancelled
+priority                INTEGER
+source_lang             VARCHAR
+target_lang             VARCHAR
+quality_preset          VARCHAR             -- fast/balanced/best
+transcribe_or_translate VARCHAR             -- transcribe/translate
+progress                FLOAT
+current_stage           VARCHAR
+eta_seconds             INTEGER
+created_at              DATETIME
+started_at              DATETIME
+completed_at            DATETIME
+output_path             VARCHAR
+srt_content             TEXT
+segments_count          INTEGER
+error                   TEXT
+retry_count             INTEGER
+max_retries             INTEGER
+worker_id               VARCHAR
+vram_used_mb            INTEGER
+processing_time_seconds FLOAT
+```
+
+### ScanRule Table (`scan_rules`)
+
+```sql
+id                          INTEGER PRIMARY KEY
+name                        VARCHAR UNIQUE
+enabled                     BOOLEAN
+priority                    INTEGER         -- Higher = evaluated first
+
+-- Conditions (all must match):
+audio_language_is           VARCHAR         -- ISO 639-2
+audio_language_not          VARCHAR         -- Comma-separated
+audio_track_count_min       INTEGER
+has_embedded_subtitle_lang  VARCHAR
+missing_embedded_subtitle_lang VARCHAR
+missing_external_subtitle_lang VARCHAR
+file_extension              VARCHAR         -- Comma-separated
+
+-- Action:
+action_type                 VARCHAR         -- transcribe/translate
+target_language             VARCHAR
+quality_preset              VARCHAR
+job_priority                INTEGER
+
+created_at                  DATETIME
+updated_at                  DATETIME
+```
+
+### SystemSettings Table (`system_settings`)
+
+```sql
+id          INTEGER PRIMARY KEY
+key         VARCHAR UNIQUE
+value       TEXT
+description TEXT
+category    VARCHAR             -- general/workers/transcription/scanner/bazarr
+value_type  VARCHAR             -- string/integer/boolean/list
+created_at  DATETIME
+updated_at  DATETIME
+```
+
+---
+
+## Transcription vs Translation
+
+### Understanding the Two Modes
+
+**Mode 1: `transcribe`** (Audio → English subtitles)
+```
+Audio (any language) → Whisper (task='translate') → English SRT
+Example: Japanese audio → anime.eng.srt
+```
+
+**Mode 2: `translate`** (Audio → English → Target language)
+```
+Audio (any language) → Whisper (task='translate') → English SRT
+                    → Google Translate → Target language SRT
+Example: Japanese audio → anime.eng.srt + anime.spa.srt
+```
+
+### Why Two Stages?
+
+**Whisper Limitation**: Whisper can only translate TO English, not between other languages.
+
+**Solution**: Two-stage process:
+1. **Stage 1 (Always)**: Whisper converts audio to English using `task='translate'`
+2. **Stage 2 (Only for translate mode)**: Google Translate converts English to target language
+
+### Output Files
+
+| Mode | Target | Output Files |
+|------|--------|--------------|
+| transcribe | spa | `.eng.srt` only |
+| translate | spa | `.eng.srt` + `.spa.srt` |
+| translate | fra | `.eng.srt` + `.fra.srt` |
+
+---
+
+## Worker Architecture
+
+### Worker Types
+
+| Type | Description | Device |
+|------|-------------|--------|
+| CPU | Uses CPU for inference | None |
+| GPU | Uses NVIDIA GPU | cuda:N |
+
+### Worker Lifecycle
+
+```
+                    ┌─────────────┐
+                    │   CREATED   │
+                    └──────┬──────┘
+                           │ start()
+                           ▼
+                    ┌─────────────┐
+        ┌──────────│    IDLE     │◄─────────┐
+        │          └──────┬──────┘          │
+        │                 │ get_job()       │ job_done()
+        │                 ▼                 │
+        │          ┌─────────────┐          │
+        │          │    BUSY     │──────────┘
+        │          └──────┬──────┘
+        │                 │ error
+        │                 ▼
+        │          ┌─────────────┐
+        └──────────│    ERROR    │
+                   └─────────────┘
+```
+
+### Process Isolation
+
+Each worker runs in a separate Python process:
+- Memory isolation (VRAM per GPU worker)
+- Crash isolation (one worker crash doesn't affect others)
+- Independent model loading
+
+---
+
+## Queue System
+
+### Priority System
+
+```python
+# Priority values
+BAZARR_REQUEST = base_priority + 10    # Highest (external request)
+MANUAL_REQUEST = base_priority + 5     # High (user-initiated)
+AUTO_SCAN      = base_priority         # Normal (scanner-generated)
+```
+
+### Job Deduplication
+
+Jobs are deduplicated by `file_path`:
+- If job exists with same `file_path`, new job is rejected
+- Returns `None` from `add_job()`
+- Prevents duplicate processing
+
+### Concurrency Safety
+
+```python
+# Row-level locking prevents race conditions
+job = session.query(Job).filter(
+    Job.status == JobStatus.QUEUED
+).with_for_update(skip_locked=True).first()
+```
+
+---
+
+## Scanner System
+
+### Scan Rule Evaluation
+
+Rules are evaluated in priority order (highest first):
+
+```python
+# Pseudo-code for rule matching
+for rule in rules.order_by(priority.desc()):
+    if rule.enabled and matches_all_conditions(file, rule):
+        create_job(file, rule.action)
+        break  # First match wins
+```
+
+### Conditions
+
+All conditions must match (AND logic):
+
+| Condition | Match If |
+|-----------|----------|
+| audio_language_is | Primary audio track language equals |
+| audio_language_not | Primary audio track language NOT in list |
+| audio_track_count_min | Number of audio tracks >= value |
+| has_embedded_subtitle_lang | Has embedded subtitle in language |
+| missing_embedded_subtitle_lang | Does NOT have embedded subtitle |
+| missing_external_subtitle_lang | Does NOT have external .srt file |
+| file_extension | File extension in comma-separated list |
+
+---
+
+## Settings System
+
+### Categories
+
+| Category | Settings |
+|----------|----------|
+| general | operation_mode, library_paths, log_level |
+| workers | cpu_count, gpu_count, auto_start, healthcheck_interval |
+| transcription | whisper_model, compute_type, vram_management |
+| scanner | enabled, schedule_interval, watcher_enabled |
+| bazarr | provider_enabled, api_key |
+
+### Caching
+
+Settings service implements caching:
+- Cache invalidated on write
+- Thread-safe access
+- Lazy loading from database
+
+---
+
+## Graceful Degradation
+
+The system can run WITHOUT Whisper/torch/PyAV installed:
+
+```python
+# Pattern used everywhere
+try:
+    import stable_whisper
+    WHISPER_AVAILABLE = True
+except ImportError:
+    stable_whisper = None
+    WHISPER_AVAILABLE = False
+
+# Later in code
+if not WHISPER_AVAILABLE:
+    raise RuntimeError("Install with: pip install stable-ts faster-whisper")
+```
+
+**What works without Whisper:**
+- Backend server starts normally
+- All APIs work fully
+- Frontend development
+- Scanner and rules management
+- Job queue (jobs just won't be processed)
+
+**What doesn't work:**
+- Actual transcription (throws RuntimeError)
+
+---
+
+## Thread Safety
+
+### Database Sessions
+
+Always use context managers:
+
+```python
+with database.get_session() as session:
+    # Session is automatically committed on success
+    # Rolled back on exception
+    job = session.query(Job).filter(...).first()
+```
+
+### Worker Pool
+
+- Each worker is a separate Process (multiprocessing)
+- Communication via shared memory (Manager)
+- No GIL contention between workers
+
+### Queue Manager
+
+- Uses SQLAlchemy row locking
+- `skip_locked=True` prevents deadlocks
+- Transactions are short-lived
+
+---
+
+## Important Patterns
+
+### Circular Import Resolution
+
+**Critical**: `backend/scanning/__init__.py` MUST NOT import `library_scanner`:
+
+```python
+# backend/scanning/__init__.py
+from backend.scanning.models import ScanRule
+from backend.scanning.file_analyzer import FileAnalyzer, FileAnalysis
+# DO NOT import library_scanner here!
+```
+
+**Why?**
+```
+library_scanner → database → models → scanning.models → database (circular!)
+```
+
+**Solution**: Import `library_scanner` locally where needed:
+```python
+def some_function():
+    from backend.scanning.library_scanner import library_scanner
+    library_scanner.scan_paths(...)
+```
+
+### Optional Imports
+
+```python
+try:
+    import pynvml
+    NVML_AVAILABLE = True
+except ImportError:
+    pynvml = None
+    NVML_AVAILABLE = False
+```
+
+### Database Session Pattern
+
+```python
+from backend.core.database import database
+
+with database.get_session() as session:
+    # All operations within session context
+    job = session.query(Job).filter(...).first()
+    job.status = JobStatus.PROCESSING
+    # Commit happens automatically
+```
+
+### API Response Pattern
+
+```python
+from pydantic import BaseModel
+
+class JobResponse(BaseModel):
+    id: str
+    status: str
+    # ...
+
+@router.get("/{job_id}", response_model=JobResponse)
+async def get_job(job_id: str):
+    with database.get_session() as session:
+        job = session.query(Job).filter(Job.id == job_id).first()
+        if not job:
+            raise HTTPException(status_code=404, detail="Not found")
+        return JobResponse(**job.to_dict())
+```
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -0,0 +1,402 @@
+# TranscriptorIO Configuration
+
+Complete documentation for the configuration system.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Configuration Methods](#configuration-methods)
+- [Settings Categories](#settings-categories)
+- [All Settings Reference](#all-settings-reference)
+- [Environment Variables](#environment-variables)
+- [Setup Wizard](#setup-wizard)
+- [API Configuration](#api-configuration)
+
+---
+
+## Overview
+
+TranscriptorIO uses a **database-backed configuration system**. All settings are stored in the `system_settings` table and can be managed through:
+
+1. **Setup Wizard** (first run)
+2. **Web UI** (Settings page)
+3. **REST API** (`/api/settings`)
+4. **CLI** (for advanced users)
+
+This approach provides:
+- Persistent configuration across restarts
+- Runtime configuration changes without restart
+- Category-based organization
+- Type validation and parsing
+
+---
+
+## Configuration Methods
+
+### 1. Setup Wizard (Recommended for First Run)
+
+```bash
+# Runs automatically on first server start
+python backend/cli.py server
+
+# Or run manually anytime
+python backend/cli.py setup
+```
+
+The wizard guides you through:
+- **Operation mode selection** (Standalone or Bazarr provider)
+- **Library paths configuration**
+- **Initial scan rules**
+- **Worker configuration** (CPU/GPU counts)
+- **Scanner schedule**
+
+### 2. Web UI (Recommended for Daily Use)
+
+Navigate to **Settings** in the web interface (`http://localhost:8000/settings`).
+
+Features:
+- Settings grouped by category tabs
+- Descriptions for each setting
+- Change detection (warns about unsaved changes)
+- Bulk save functionality
+
+### 3. REST API (For Automation/Integration)
+
+```bash
+# Get all settings
+curl http://localhost:8000/api/settings
+
+# Get settings by category
+curl http://localhost:8000/api/settings?category=workers
+
+# Update a setting
+curl -X PUT http://localhost:8000/api/settings/worker_cpu_count \
+  -H "Content-Type: application/json" \
+  -d '{"value": "2"}'
+
+# Bulk update
+curl -X POST http://localhost:8000/api/settings/bulk-update \
+  -H "Content-Type: application/json" \
+  -d '{
+    "settings": {
+      "worker_cpu_count": "2",
+      "worker_gpu_count": "1"
+    }
+  }'
+```
+
+---
+
+## Settings Categories
+
+| Category | Description |
+|----------|-------------|
+| `general` | Operation mode, library paths, API server |
+| `workers` | CPU/GPU worker configuration |
+| `transcription` | Whisper model and transcription options |
+| `subtitles` | Subtitle naming and formatting |
+| `skip` | Skip conditions for files |
+| `scanner` | Library scanner configuration |
+| `bazarr` | Bazarr provider integration |
+| `advanced` | Advanced options (path mapping, etc.) |
+
+---
+
+## All Settings Reference
+
+### General Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `operation_mode` | string | `standalone` | Operation mode: `standalone`, `provider`, or `standalone,provider` |
+| `library_paths` | list | `""` | Comma-separated library paths to scan |
+| `api_host` | string | `0.0.0.0` | API server host |
+| `api_port` | integer | `8000` | API server port |
+| `debug` | boolean | `false` | Enable debug mode |
+
+### Worker Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `worker_cpu_count` | integer | `0` | Number of CPU workers to start on boot |
+| `worker_gpu_count` | integer | `0` | Number of GPU workers to start on boot |
+| `concurrent_transcriptions` | integer | `2` | Maximum concurrent transcriptions |
+| `worker_healthcheck_interval` | integer | `60` | Worker health check interval (seconds) |
+| `worker_auto_restart` | boolean | `true` | Auto-restart failed workers |
+| `clear_vram_on_complete` | boolean | `true` | Clear VRAM after job completion |
+
+### Transcription Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `whisper_model` | string | `medium` | Whisper model: `tiny`, `base`, `small`, `medium`, `large-v3`, `large-v3-turbo` |
+| `model_path` | string | `./models` | Path to store Whisper models |
+| `transcribe_device` | string | `cpu` | Device: `cpu`, `cuda`, `gpu` |
+| `cpu_compute_type` | string | `auto` | CPU compute type: `auto`, `int8`, `float32` |
+| `gpu_compute_type` | string | `auto` | GPU compute type: `auto`, `float16`, `float32`, `int8_float16`, `int8` |
+| `whisper_threads` | integer | `4` | Number of CPU threads for Whisper |
+| `transcribe_or_translate` | string | `transcribe` | Default mode: `transcribe` or `translate` |
+| `word_level_highlight` | boolean | `false` | Enable word-level highlighting |
+| `detect_language_length` | integer | `30` | Seconds of audio for language detection |
+| `detect_language_offset` | integer | `0` | Offset for language detection sample |
+
+### Whisper Models
+
+| Model | Size | Speed | Quality | VRAM |
+|-------|------|-------|---------|------|
+| `tiny` | 39M | Fastest | Basic | ~1GB |
+| `base` | 74M | Very Fast | Fair | ~1GB |
+| `small` | 244M | Fast | Good | ~2GB |
+| `medium` | 769M | Medium | Great | ~5GB |
+| `large-v3` | 1.5G | Slow | Excellent | ~10GB |
+| `large-v3-turbo` | 809M | Fast | Excellent | ~6GB |
+
+### Subtitle Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `subtitle_language_name` | string | `""` | Custom subtitle language name |
+| `subtitle_language_naming_type` | string | `ISO_639_2_B` | Naming type: `ISO_639_1`, `ISO_639_2_T`, `ISO_639_2_B`, `NAME`, `NATIVE` |
+| `custom_regroup` | string | `cm_sl=84_sl=42++++++1` | Custom regrouping algorithm |
+
+**Language Naming Types:**
+
+| Type | Example (Spanish) |
+|------|-------------------|
+| ISO_639_1 | `es` |
+| ISO_639_2_T | `spa` |
+| ISO_639_2_B | `spa` |
+| NAME | `Spanish` |
+| NATIVE | `Espanol` |
+
+### Skip Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `skip_if_external_subtitles_exist` | boolean | `false` | Skip if any external subtitle exists |
+| `skip_if_target_subtitles_exist` | boolean | `true` | Skip if target language subtitle exists |
+| `skip_if_internal_subtitles_language` | string | `""` | Skip if internal subtitle in this language |
+| `skip_subtitle_languages` | list | `""` | Pipe-separated language codes to skip |
+| `skip_if_audio_languages` | list | `""` | Skip if audio track is in these languages |
+| `skip_unknown_language` | boolean | `false` | Skip files with unknown audio language |
+| `skip_only_subgen_subtitles` | boolean | `false` | Only skip SubGen-generated subtitles |
+
+### Scanner Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `scanner_enabled` | boolean | `true` | Enable library scanner |
+| `scanner_cron` | string | `0 2 * * *` | Cron expression for scheduled scans |
+| `scanner_schedule_interval_minutes` | integer | `360` | Scan interval in minutes (6 hours) |
+| `watcher_enabled` | boolean | `false` | Enable real-time file watcher |
+| `auto_scan_enabled` | boolean | `false` | Enable automatic scheduled scanning |
+
+### Bazarr Provider Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `bazarr_provider_enabled` | boolean | `false` | Enable Bazarr provider mode |
+| `bazarr_url` | string | `http://bazarr:6767` | Bazarr server URL |
+| `bazarr_api_key` | string | `""` | Bazarr API key (auto-generated) |
+| `provider_timeout_seconds` | integer | `600` | Provider request timeout |
+| `provider_callback_enabled` | boolean | `true` | Enable callback on completion |
+| `provider_polling_interval` | integer | `30` | Polling interval for jobs |
+
+### Advanced Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `force_detected_language_to` | string | `""` | Force detected language to specific code |
+| `preferred_audio_languages` | list | `eng` | Pipe-separated preferred audio languages |
+| `use_path_mapping` | boolean | `false` | Enable path mapping for network shares |
+| `path_mapping_from` | string | `/tv` | Path mapping source |
+| `path_mapping_to` | string | `/Volumes/TV` | Path mapping destination |
+| `lrc_for_audio_files` | boolean | `true` | Generate LRC files for audio-only files |
+
+---
+
+## Environment Variables
+
+The **only** environment variable required is `DATABASE_URL` in the `.env` file:
+
+```bash
+# SQLite (default, good for single-user)
+DATABASE_URL=sqlite:///./transcriptarr.db
+
+# PostgreSQL (recommended for production)
+DATABASE_URL=postgresql://user:password@localhost:5432/transcriptarr
+
+# MariaDB/MySQL
+DATABASE_URL=mariadb+pymysql://user:password@localhost:3306/transcriptarr
+```
+
+**All other configuration** is stored in the database and managed through:
+- Setup Wizard (first run)
+- Web UI Settings page
+- Settings API endpoints
+
+This design ensures:
+- No `.env` file bloat
+- Runtime configuration changes without restart
+- Centralized configuration management
+- Easy backup (configuration is in the database)
+
+---
+
+## Setup Wizard
+
+### Standalone Mode
+
+For independent operation with local library scanning.
+
+**Configuration Flow:**
+1. Select library paths (e.g., `/media/anime`, `/media/movies`)
+2. Create initial scan rules (e.g., "Japanese audio → Spanish subtitles")
+3. Configure workers (CPU count, GPU count)
+4. Set scanner interval (default: 6 hours)
+
+**API Endpoint:** `POST /api/setup/standalone`
+
+```json
+{
+  "library_paths": ["/media/anime", "/media/movies"],
+  "scan_rules": [
+    {
+      "name": "Japanese to Spanish",
+      "audio_language_is": "jpn",
+      "missing_external_subtitle_lang": "spa",
+      "target_language": "spa",
+      "action_type": "transcribe"
+    }
+  ],
+  "worker_config": {
+    "count": 1,
+    "type": "cpu"
+  },
+  "scanner_config": {
+    "interval_minutes": 360
+  }
+}
+```
+
+### Bazarr Slave Mode
+
+For integration with Bazarr as a subtitle provider.
+
+**Configuration Flow:**
+1. Select Bazarr mode
+2. System auto-generates API key
+3. Displays connection info for Bazarr configuration
+
+**API Endpoint:** `POST /api/setup/bazarr-slave`
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Bazarr slave mode configured successfully",
+  "bazarr_info": {
+    "mode": "bazarr_slave",
+    "host": "127.0.0.1",
+    "port": 8000,
+    "api_key": "generated_api_key_here",
+    "provider_url": "http://127.0.0.1:8000"
+  }
+}
+```
+
+---
+
+## API Configuration
+
+### Get All Settings
+
+```bash
+curl http://localhost:8000/api/settings
+```
+
+### Get by Category
+
+```bash
+curl "http://localhost:8000/api/settings?category=workers"
+```
+
+### Get Single Setting
+
+```bash
+curl http://localhost:8000/api/settings/worker_cpu_count
+```
+
+### Update Setting
+
+```bash
+curl -X PUT http://localhost:8000/api/settings/worker_cpu_count \
+  -H "Content-Type: application/json" \
+  -d '{"value": "2"}'
+```
+
+### Bulk Update
+
+```bash
+curl -X POST http://localhost:8000/api/settings/bulk-update \
+  -H "Content-Type: application/json" \
+  -d '{
+    "settings": {
+      "worker_cpu_count": "2",
+      "worker_gpu_count": "1",
+      "scanner_enabled": "true"
+    }
+  }'
+```
+
+### Create Custom Setting
+
+```bash
+curl -X POST http://localhost:8000/api/settings \
+  -H "Content-Type: application/json" \
+  -d '{
+    "key": "my_custom_setting",
+    "value": "custom_value",
+    "description": "My custom setting",
+    "category": "advanced",
+    "value_type": "string"
+  }'
+```
+
+### Delete Setting
+
+```bash
+curl -X DELETE http://localhost:8000/api/settings/my_custom_setting
+```
+
+### Initialize Defaults
+
+```bash
+curl -X POST http://localhost:8000/api/settings/init-defaults
+```
+
+---
+
+## Python Usage
+
+```python
+from backend.core.settings_service import settings_service
+
+# Get setting with default
+cpu_count = settings_service.get("worker_cpu_count", default=1)
+
+# Set setting
+settings_service.set("worker_cpu_count", 2)
+
+# Bulk update
+settings_service.bulk_update({
+    "worker_cpu_count": "2",
+    "scanner_enabled": "true"
+})
+
+# Get all settings in category
+worker_settings = settings_service.get_by_category("workers")
+
+# Initialize defaults (safe to call multiple times)
+settings_service.init_default_settings()
+```
--- a/docs/FRONTEND.md
+++ b/docs/FRONTEND.md
@@ -0,0 +1,666 @@
+# TranscriptorIO Frontend
+
+Technical documentation for the Vue 3 frontend application.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Technology Stack](#technology-stack)
+- [Directory Structure](#directory-structure)
+- [Development Setup](#development-setup)
+- [Views](#views)
+- [Components](#components)
+- [State Management](#state-management)
+- [API Service](#api-service)
+- [Routing](#routing)
+- [Styling](#styling)
+- [Build and Deployment](#build-and-deployment)
+
+---
+
+## Overview
+
+The TranscriptorIO frontend is a Single Page Application (SPA) built with Vue 3, featuring:
+
+- **6 Complete Views**: Dashboard, Queue, Scanner, Rules, Workers, Settings
+- **Real-time Updates**: Polling-based status updates
+- **Dark Theme**: Tdarr-inspired dark UI
+- **Type Safety**: Full TypeScript support
+- **State Management**: Pinia stores for shared state
+
+---
+
+## Technology Stack
+
+| Technology | Version | Purpose |
+|------------|---------|---------|
+| Vue.js | 3.4+ | UI Framework |
+| Vue Router | 4.2+ | Client-side routing |
+| Pinia | 2.1+ | State management |
+| Axios | 1.6+ | HTTP client |
+| TypeScript | 5.3+ | Type safety |
+| Vite | 5.0+ | Build tool / dev server |
+
+---
+
+## Directory Structure
+
+```
+frontend/
+├── public/                     # Static assets (favicon, etc.)
+├── src/
+│   ├── main.ts                 # Application entry point
+│   ├── App.vue                 # Root component + navigation
+│   │
+│   ├── views/                  # Page components (routed)
+│   │   ├── DashboardView.vue   # System overview + resources
+│   │   ├── QueueView.vue       # Job management
+│   │   ├── ScannerView.vue     # Scanner control
+│   │   ├── RulesView.vue       # Scan rules CRUD
+│   │   ├── WorkersView.vue     # Worker pool management
+│   │   └── SettingsView.vue    # Settings management
+│   │
+│   ├── components/             # Reusable components
+│   │   ├── ConnectionWarning.vue  # Backend connection status
+│   │   ├── PathBrowser.vue        # Filesystem browser modal
+│   │   └── SetupWizard.vue        # First-run setup wizard
+│   │
+│   ├── stores/                 # Pinia state stores
+│   │   ├── config.ts           # Configuration store
+│   │   ├── system.ts           # System status store
+│   │   ├── workers.ts          # Workers store
+│   │   └── jobs.ts             # Jobs store
+│   │
+│   ├── services/
+│   │   └── api.ts              # Axios API client
+│   │
+│   ├── router/
+│   │   └── index.ts            # Vue Router configuration
+│   │
+│   ├── types/
+│   │   └── api.ts              # TypeScript interfaces
+│   │
+│   └── assets/
+│       └── css/
+│           └── main.css        # Global styles (dark theme)
+│
+├── index.html                  # HTML template
+├── vite.config.ts              # Vite configuration
+├── tsconfig.json               # TypeScript configuration
+└── package.json                # Dependencies
+```
+
+---
+
+## Development Setup
+
+### Prerequisites
+
+- Node.js 18+ and npm
+- Backend server running on port 8000
+
+### Installation
+
+```bash
+cd frontend
+
+# Install dependencies
+npm install
+
+# Start development server (with proxy to backend)
+npm run dev
+```
+
+### Development URLs
+
+| URL | Description |
+|-----|-------------|
+| http://localhost:3000 | Frontend dev server |
+| http://localhost:8000 | Backend API |
+| http://localhost:8000/docs | Swagger API docs |
+
+### Scripts
+
+```bash
+npm run dev      # Start dev server with HMR
+npm run build    # Build for production
+npm run preview  # Preview production build
+npm run lint     # Run ESLint
+```
+
+---
+
+## Views
+
+### DashboardView
+
+**Path**: `/`
+
+System overview with real-time resource monitoring.
+
+**Features**:
+- System status (running/stopped)
+- CPU usage gauge
+- RAM usage gauge
+- GPU usage gauges (per device)
+- Recent jobs list
+- Worker pool summary
+- Scanner status
+
+**Data Sources**:
+- `GET /api/status`
+- `GET /api/system/resources`
+- `GET /api/jobs?page_size=10`
+
+### QueueView
+
+**Path**: `/queue`
+
+Job queue management with filtering and pagination.
+
+**Features**:
+- Job list with status icons
+- Status filter (All/Queued/Processing/Completed/Failed)
+- Pagination controls
+- Retry failed jobs
+- Cancel queued/processing jobs
+- Clear completed jobs
+- Job progress display
+- Processing time display
+
+**Data Sources**:
+- `GET /api/jobs`
+- `GET /api/jobs/stats`
+- `POST /api/jobs/{id}/retry`
+- `DELETE /api/jobs/{id}`
+- `POST /api/jobs/queue/clear`
+
+### ScannerView
+
+**Path**: `/scanner`
+
+Library scanner control and configuration.
+
+**Features**:
+- Scanner status display
+- Start/stop scheduler
+- Start/stop file watcher
+- Manual scan trigger
+- Scan results display
+- Next scan time
+- Total files scanned counter
+
+**Data Sources**:
+- `GET /api/scanner/status`
+- `POST /api/scanner/scan`
+- `POST /api/scanner/scheduler/start`
+- `POST /api/scanner/scheduler/stop`
+- `POST /api/scanner/watcher/start`
+- `POST /api/scanner/watcher/stop`
+
+### RulesView
+
+**Path**: `/rules`
+
+Scan rules CRUD management.
+
+**Features**:
+- Rules list with priority ordering
+- Create new rule (modal)
+- Edit existing rule (modal)
+- Delete rule (with confirmation)
+- Toggle rule enabled/disabled
+- Condition configuration
+- Action configuration
+
+**Data Sources**:
+- `GET /api/scan-rules`
+- `POST /api/scan-rules`
+- `PUT /api/scan-rules/{id}`
+- `DELETE /api/scan-rules/{id}`
+- `POST /api/scan-rules/{id}/toggle`
+
+### WorkersView
+
+**Path**: `/workers`
+
+Worker pool management.
+
+**Features**:
+- Worker list with status
+- Add CPU worker
+- Add GPU worker (with device selection)
+- Remove worker
+- Start/stop pool
+- Worker statistics
+- Current job display per worker
+- Progress and ETA display
+
+**Data Sources**:
+- `GET /api/workers`
+- `GET /api/workers/stats`
+- `POST /api/workers`
+- `DELETE /api/workers/{id}`
+- `POST /api/workers/pool/start`
+- `POST /api/workers/pool/stop`
+
+### SettingsView
+
+**Path**: `/settings`
+
+Database-backed settings management.
+
+**Features**:
+- Settings grouped by category
+- Category tabs (General, Workers, Transcription, Scanner, Bazarr)
+- Edit settings in-place
+- Save changes button
+- Change detection (unsaved changes warning)
+- Setting descriptions
+
+**Data Sources**:
+- `GET /api/settings`
+- `PUT /api/settings/{key}`
+- `POST /api/settings/bulk-update`
+
+---
+
+## Components
+
+### ConnectionWarning
+
+Displays warning banner when backend is unreachable.
+
+**Props**: None
+
+**State**: Uses `systemStore.isConnected`
+
+### PathBrowser
+
+Modal component for browsing filesystem paths.
+
+**Props**:
+- `show: boolean` - Show/hide modal
+- `initialPath: string` - Starting path
+
+**Emits**:
+- `select(path: string)` - Path selected
+- `close()` - Modal closed
+
+**API Calls**:
+- `GET /api/filesystem/browse?path={path}`
+- `GET /api/filesystem/common-paths`
+
+### SetupWizard
+
+First-run setup wizard component.
+
+**Props**: None
+
+**Features**:
+- Mode selection (Standalone/Bazarr)
+- Library path configuration
+- Scan rule creation
+- Worker configuration
+- Scanner interval setting
+
+**API Calls**:
+- `GET /api/setup/status`
+- `POST /api/setup/standalone`
+- `POST /api/setup/bazarr-slave`
+- `POST /api/setup/skip`
+
+---
+
+## State Management
+
+### Pinia Stores
+
+#### systemStore (`stores/system.ts`)
+
+Global system state.
+
+```typescript
+interface SystemState {
+  isConnected: boolean
+  status: SystemStatus | null
+  resources: SystemResources | null
+  loading: boolean
+  error: string | null
+}
+
+// Actions
+fetchStatus()      // Fetch /api/status
+fetchResources()   // Fetch /api/system/resources
+startPolling()     // Start auto-refresh
+stopPolling()      // Stop auto-refresh
+```
+
+#### workersStore (`stores/workers.ts`)
+
+Worker pool state.
+
+```typescript
+interface WorkersState {
+  workers: Worker[]
+  stats: WorkerStats | null
+  loading: boolean
+  error: string | null
+}
+
+// Actions
+fetchWorkers()                      // Fetch all workers
+fetchStats()                        // Fetch pool stats
+addWorker(type, deviceId?)          // Add worker
+removeWorker(id)                    // Remove worker
+startPool(cpuCount, gpuCount)       // Start pool
+stopPool()                          // Stop pool
+```
+
+#### jobsStore (`stores/jobs.ts`)
+
+Job queue state.
+
+```typescript
+interface JobsState {
+  jobs: Job[]
+  stats: QueueStats | null
+  total: number
+  page: number
+  pageSize: number
+  statusFilter: string | null
+  loading: boolean
+  error: string | null
+}
+
+// Actions
+fetchJobs()                 // Fetch with current filters
+fetchStats()                // Fetch queue stats
+retryJob(id)                // Retry failed job
+cancelJob(id)               // Cancel job
+clearCompleted()            // Clear completed jobs
+setStatusFilter(status)     // Update filter
+setPage(page)               // Change page
+```
+
+#### configStore (`stores/config.ts`)
+
+Settings configuration state.
+
+```typescript
+interface ConfigState {
+  settings: Setting[]
+  loading: boolean
+  error: string | null
+  pendingChanges: Record<string, string>
+}
+
+// Actions
+fetchSettings(category?)    // Fetch settings
+updateSetting(key, value)   // Queue update
+saveChanges()               // Save all pending
+discardChanges()            // Discard pending
+```
+
+---
+
+## API Service
+
+### Configuration (`services/api.ts`)
+
+```typescript
+import axios from 'axios'
+
+const api = axios.create({
+  baseURL: '/api',
+  timeout: 30000,
+  headers: {
+    'Content-Type': 'application/json'
+  }
+})
+
+// Response interceptor for error handling
+api.interceptors.response.use(
+  response => response,
+  error => {
+    console.error('API Error:', error)
+    return Promise.reject(error)
+  }
+)
+
+export default api
+```
+
+### Usage Example
+
+```typescript
+import api from '@/services/api'
+
+// GET request
+const response = await api.get('/jobs', {
+  params: { status_filter: 'queued', page: 1 }
+})
+
+// POST request
+const job = await api.post('/jobs', {
+  file_path: '/media/video.mkv',
+  target_lang: 'spa'
+})
+
+// PUT request
+await api.put('/settings/worker_cpu_count', {
+  value: '2'
+})
+
+// DELETE request
+await api.delete(`/jobs/${jobId}`)
+```
+
+---
+
+## Routing
+
+### Route Configuration
+
+```typescript
+const routes = [
+  { path: '/', name: 'Dashboard', component: DashboardView },
+  { path: '/workers', name: 'Workers', component: WorkersView },
+  { path: '/queue', name: 'Queue', component: QueueView },
+  { path: '/scanner', name: 'Scanner', component: ScannerView },
+  { path: '/rules', name: 'Rules', component: RulesView },
+  { path: '/settings', name: 'Settings', component: SettingsView }
+]
+```
+
+### Navigation
+
+Navigation is handled in `App.vue` with a sidebar menu.
+
+```vue
+<nav class="sidebar">
+  <router-link to="/">Dashboard</router-link>
+  <router-link to="/workers">Workers</router-link>
+  <router-link to="/queue">Queue</router-link>
+  <router-link to="/scanner">Scanner</router-link>
+  <router-link to="/rules">Rules</router-link>
+  <router-link to="/settings">Settings</router-link>
+</nav>
+
+<main class="content">
+  <router-view />
+</main>
+```
+
+---
+
+## Styling
+
+### Dark Theme
+
+The application uses a Tdarr-inspired dark theme defined in `assets/css/main.css`.
+
+**Color Palette**:
+
+| Variable | Value | Usage |
+|----------|-------|-------|
+| --bg-primary | #1a1a2e | Main background |
+| --bg-secondary | #16213e | Card background |
+| --bg-tertiary | #0f3460 | Hover states |
+| --text-primary | #eaeaea | Primary text |
+| --text-secondary | #a0a0a0 | Secondary text |
+| --accent-primary | #e94560 | Buttons, links |
+| --accent-success | #4ade80 | Success states |
+| --accent-warning | #fbbf24 | Warning states |
+| --accent-error | #ef4444 | Error states |
+
+### Component Styling
+
+Components use scoped CSS with CSS variables:
+
+```vue
+<style scoped>
+.card {
+  background: var(--bg-secondary);
+  border-radius: 8px;
+  padding: 1.5rem;
+}
+
+.btn-primary {
+  background: var(--accent-primary);
+  color: white;
+  border: none;
+  padding: 0.5rem 1rem;
+  border-radius: 4px;
+  cursor: pointer;
+}
+
+.btn-primary:hover {
+  opacity: 0.9;
+}
+</style>
+```
+
+---
+
+## Build and Deployment
+
+### Production Build
+
+```bash
+cd frontend
+npm run build
+```
+
+This creates a `dist/` folder with:
+- `index.html` - Entry HTML
+- `assets/` - JS, CSS bundles (hashed filenames)
+
+### Deployment Options
+
+#### Option 1: Served by Backend (Recommended)
+
+The FastAPI backend automatically serves the frontend from `frontend/dist/`:
+
+```python
+# backend/app.py
+frontend_path = Path(__file__).parent.parent / "frontend" / "dist"
+
+if frontend_path.exists():
+    app.mount("/assets", StaticFiles(directory=str(frontend_path / "assets")))
+
+    @app.get("/{full_path:path}")
+    async def serve_frontend(full_path: str = ""):
+        return FileResponse(str(frontend_path / "index.html"))
+```
+
+**Access**: http://localhost:8000
+
+#### Option 2: Nginx Reverse Proxy
+
+```nginx
+server {
+    listen 80;
+    server_name transcriptorio.local;
+
+    # Frontend
+    location / {
+        root /var/www/transcriptorio/frontend/dist;
+        try_files $uri $uri/ /index.html;
+    }
+
+    # Backend API
+    location /api {
+        proxy_pass http://localhost:8000;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+    }
+}
+```
+
+#### Option 3: Docker
+
+```dockerfile
+# Build frontend
+FROM node:18-alpine AS frontend-builder
+WORKDIR /app/frontend
+COPY frontend/package*.json ./
+RUN npm ci
+COPY frontend/ ./
+RUN npm run build
+
+# Final image
+FROM python:3.12-slim
+COPY --from=frontend-builder /app/frontend/dist /app/frontend/dist
+# ... rest of backend setup
+```
+
+---
+
+## TypeScript Interfaces
+
+### Key Types (`types/api.ts`)
+
+```typescript
+// Job
+interface Job {
+  id: string
+  file_path: string
+  file_name: string
+  status: 'queued' | 'processing' | 'completed' | 'failed' | 'cancelled'
+  priority: number
+  progress: number
+  // ... more fields
+}
+
+// Worker
+interface Worker {
+  worker_id: string
+  worker_type: 'cpu' | 'gpu'
+  device_id: number | null
+  status: 'idle' | 'busy' | 'stopped' | 'error'
+  current_job_id: string | null
+  jobs_completed: number
+  jobs_failed: number
+}
+
+// Setting
+interface Setting {
+  id: number
+  key: string
+  value: string | null
+  description: string | null
+  category: string | null
+  value_type: string | null
+}
+
+// ScanRule
+interface ScanRule {
+  id: number
+  name: string
+  enabled: boolean
+  priority: number
+  conditions: ScanRuleConditions
+  action: ScanRuleAction
+}
+```