docs: add comprehensive project documentation

- Replace original Subgen README with TranscriptorIO documentation - Add docs/API.md with 45+ REST endpoint documentation - Add docs/ARCHITECTURE.md with backend component details - Add docs/FRONTEND.md with Vue 3 frontend structure - Add docs/CONFIGURATION.md with settings system documentation - Remove outdated backend/README.md
2026-01-16 15:10:41 +01:00
parent 9655686a50
commit 8373d8765f
6 changed files with 3109 additions and 435 deletions
--- a/README.md
+++ b/README.md
@@ -1,282 +1,265 @@
-[![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/donate/?hosted_button_id=SU4QQP6LH5PF6)
-<img src="https://raw.githubusercontent.com/McCloudS/subgen/main/icon.png" width="200">
+# 🎬 TranscriptorIO

-<details>
-<summary>Updates:</summary>
+**AI-powered subtitle transcription service with REST API and Web UI**

-26 Aug 2025: Renamed environment variables to make them slightly easier to understand.  Currently maintains backwards compatibility. See https://github.com/McCloudS/subgen/pull/229
+[![Python](https://img.shields.io/badge/Python-3.12+-blue.svg)](https://www.python.org/)
+[![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-green.svg)](https://fastapi.tiangolo.com/)
+[![Vue.js](https://img.shields.io/badge/Vue.js-3.x-brightgreen.svg)](https://vuejs.org/)
+[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

-12 Aug 2025: Added distil-large-v3.5
+TranscriptorIO is an AI-powered subtitle transcription service based on [Subgen](https://github.com/McCloudS/subgen), featuring a modern FastAPI backend with 45+ REST endpoints, a Vue 3 web interface, and a distributed worker pool architecture.

-7 Feb: Fixed (V)RAM clearing, added PLEX_QUEUE_SEASON, other extraneous fixes or refactorting.  
+---

-23 Dec: Added PLEX_QUEUE_NEXT_EPISODE and PLEX_QUEUE_SERIES.  Will automatically start generating subtitles for the next episode in your series, or queue the whole series.  
+## ✨ Features

-4 Dec: Added more ENV settings: DETECT_LANGUAGE_OFFSET, PREFERRED_AUDIO_LANGUAGES, SKIP_IF_AUDIO_TRACK_IS, ONLY_SKIP_IF_SUBGEN_SUBTITLE, SKIP_UNKNOWN_LANGUAGE, SKIP_IF_LANGUAGE_IS_NOT_SET_BUT_SUBTITLES_EXIST, SHOULD_WHISPER_DETECT_AUDIO_LANGUAGE
+### 🎯 Core Features
+- **Whisper Transcription** - Support for faster-whisper and stable-ts
+- **Translation** - Two-stage translation: Whisper to English, then Google Translate to target language
+- **CPU/GPU Workers** - Scalable worker pool with CUDA support
+- **Persistent Queue** - Priority-based queue manager with SQLite/PostgreSQL
+- **Library Scanner** - Automatic scanning with configurable rules
+- **REST API** - 45+ endpoints with FastAPI
+- **Web UI** - Complete Vue 3 dashboard with 6 views
+- **Setup Wizard** - Interactive first-run configuration
+- **Real-time Monitoring** - File watcher, scheduled scans, and system resources

-30 Nov 2024: Signifcant refactoring and handling by Muisje.  Added language code class for more robustness and flexibility and ability to separate audio tracks to make sure you get the one you want.  New ENV Variables: SUBTITLE_LANGUAGE_NAMING_TYPE, SKIP_IF_AUDIO_TRACK_IS, PREFERRED_AUDIO_LANGUAGE, SKIP_IF_TO_TRANSCRIBE_SUB_ALREADY_EXIST
+### 🔧 Technical Features
+- **Multiprocessing**: Workers isolated in separate processes
+- **Priority Queuing**: Queue with priorities and deduplication
+- **Graceful Degradation**: Works without optional dependencies (Whisper, GPU)
+- **Thread-Safe**: Row locking and context managers
+- **Auto-retry**: Automatic retry of failed jobs
+- **Health Monitoring**: Detailed statistics and health checks
+- **Database-backed Settings**: All configuration stored in database, editable via Web UI

-    There will be some minor hiccups, so please identify them as we work through this major overhaul.
+---

-22 Nov 2024: Updated to support large-v3-turbo
+## 🚀 Quick Start

-30 Sept 2024: Removed webui
+### 1. Install dependencies

-5 Sept 2024: Fixed Emby response to a test message/notification.  Clarified Emby/Plex/Jellyfin instructions for paths.
+```bash
+# Basic dependencies
+pip install -r requirements.txt

-14 Aug 2024: Cleaned up usage of kwargs across the board a bit.  Added ability for /asr to encode or not, so you don't need to worry about what files/formats you upload.
-
-3 Aug 2024: Added SUBGEN_KWARGS environment variable which allows you to override the model.transcribe with most options you'd like from whisper, faster-whisper, or stable-ts.  This won't be exposed via the webui, it's best to set directly.
-
-21 Apr 2024: Fixed queuing with thanks to https://github.com/xhzhu0628 @ https://github.com/McCloudS/subgen/pull/85.  Bazarr intentionally doesn't follow `CONCURRENT_TRANSCRIPTIONS` because it needs a time sensitive response.
-
-31 Mar 2024: Removed `/subsync` endpoint and general refactoring.  Open an issue if you were using it!
-
-24 Mar 2024: ~~Added a 'webui' to configure environment variables.  You can use this instead of manually editing the script or using Environment Variables in your OS or Docker (if you want).  The config will prioritize OS Env Variables, then the .env file, then the defaults.  You can access it at `http://subgen:9000/`~~
-
-23 Mar 2024: Added `CUSTOM_REGROUP` to try to 'clean up' subtitles a bit.  
-
-22 Mar 2024: Added LRC capability via see: `'LRC_FOR_AUDIO_FILES' | True | Will generate LRC (instead of SRT) files for filetypes: '.mp3', '.flac', '.wav', '.alac', '.ape', '.ogg', '.wma', '.m4a', '.m4b', '.aac', '.aiff' |`
-
-21 Mar 2024: Added a 'wizard' into the launcher that will help standalone users get common Bazarr variables configured.  See below in Launcher section.  Removed 'Transformers' as an option.  While I usually don't like to remove features, I don't think anyone is using this and the results are wildly unpredictable and often cause out of memory errors.  Added two new environment variables called `USE_MODEL_PROMPT` and `CUSTOM_MODEL_PROMPT`.  If `USE_MODEL_PROMPT` is `True` it will use `CUSTOM_MODEL_PROMPT` if set, otherwise will default to using the pre-configured language pairings, such as: `"en": "Hello, welcome to my lecture.",
-    "zh": "你好，欢迎来到我的讲座。"`  These pre-configurated translations are geared towards fixing some audio that may not have punctionation.  We can prompt it to try to force the use of punctuation during transcription.
-
-19 Mar 2024: Added a `MONITOR` environment variable.  Will 'watch' or 'monitor' your `TRANSCRIBE_FOLDERS` for changes and run on them.  Useful if you just want to paste files into a folder and get subtitles.   
-
-6 Mar 2024: Added a `/subsync` endpoint that can attempt to align/synchronize subtitles to a file.  Takes audio_file, subtitle_file, language (2 letter code), and outputs an srt.
-
-5 Mar 2024: Cleaned up logging. Added timestamps option (if Debug = True, timestamps will print in logs).
-
-4 Mar 2024: Updated Dockerfile CUDA to 12.2.2 (From CTranslate2).  Added endpoint `/status` to return Subgen version.  Can also use distil models now!  See variables below!
-
-29 Feb 2024: Changed sefault port to align with whisper-asr and deconflict other consumers of the previous port.
-
-11 Feb 2024: Added a 'launcher.py' file for Docker to prevent huge image downloads. Now set UPDATE to True if you want pull the latest version, otherwise it will default to what was in the image on build.  Docker builds will still be auto-built on any commit.  If you don't want to use the auto-update function, no action is needed on your part and continue to update docker images as before.  Fixed bug where detect-langauge could return an empty result.  Reduced useless debug output that was spamming logs and defaulted DEBUG to True.  Added APPEND, which will add f"Transcribed by whisperAI with faster-whisper ({whisper_model}) on {datetime.now()}" at the end of a subtitle.
-
-10 Feb 2024: Added some features from JaiZed's branch such as skipping if SDH subtitles are detected, functions updated to also be able to transcribe audio files, allow individual files to be manually transcribed, and a better implementation of forceLanguage. Added `/batch` endpoint (Thanks JaiZed).  Allows you to navigate in a browser to http://subgen_ip:9000/docs and call the batch endpoint which can take a file or a folder to manually transcribe files.  Added CLEAR_VRAM_ON_COMPLETE, HF_TRANSFORMERS, HF_BATCH_SIZE.  Hugging Face Transformers boast '9x increase', but my limited testing shows it's comparable to faster-whisper or slightly slower.  I also have an older 8gb GPU.  Simplest way to persist HF Transformer models is to set "HF_HUB_CACHE" and set it to "/subgen/models" for Docker (assuming you have the matching volume).
-
-8 Feb 2024: Added FORCE_DETECTED_LANGUAGE_TO to force a wrongly detected language.  Fixed asr to actually use the language passed to it.  
-
-5 Feb 2024: General housekeeping, minor tweaks on the TRANSCRIBE_FOLDERS function.
-
-28 Jan 2024: Fixed issue with ffmpeg python module not importing correctly.  Removed separate GPU/CPU containers.  Also removed the script from installing packages, which should help with odd updates I can't control (from other packages/modules). The image is a couple gigabytes larger, but allows easier maintenance.  
-
-19 Dec 2023: Added the ability for Plex and Jellyfin to automatically update metadata so the subtitles shows up properly on playback. (See https://github.com/McCloudS/subgen/pull/33 from Rikiar73574)  
-
-31 Oct 2023: Added Bazarr support via Whipser provider.
-
-25 Oct 2023: Added Emby (IE http://192.168.1.111:9000/emby) support and TRANSCRIBE_FOLDERS, which will recurse through the provided folders and generate subtitles.  It's geared towards attempting to transcribe existing media without using a webhook.
-
-23 Oct 2023: There are now two docker images, ones for CPU (it's smaller): mccloud/subgen:latest, mccloud/subgen:cpu, the other is for cuda/GPU: mccloud/subgen:cuda.  I also added Jellyfin support and considerable cleanup in the script. I also renamed the webhooks, so they will require new configuration/updates on your end. Instead of /webhook they are now /plex, /tautulli, and /jellyfin.
-
-22 Oct 2023: The script should have backwards compability with previous envirionment settings, but just to be sure, look at the new options below.  If you don't want to manually edit your environment variables, just edit the script manually. While I have added GPU support, I haven't tested it yet.
-
-19 Oct 2023: And we're back!  Uses faster-whisper and stable-ts.  Shouldn't break anything from previous settings, but adds a couple new options that aren't documented at this point in time.  As of now, this is not a docker image on dockerhub.  The potential intent is to move this eventually to a pure python script, primarily to simplify my efforts.  Quick and dirty to meet dependencies: pip or `pip3 install flask requests stable-ts faster-whisper`
-
-This potentially has the ability to use CUDA/Nvidia GPU's, but I don't have one set up yet.  Tesla T4 is in the mail!
-
-2 Feb 2023: Added Tautulli webhooks back in.  Didn't realize Plex webhooks was PlexPass only.  See below for instructions to add it back in.
-
-31 Jan 2023 : Rewrote the script substantially to remove Tautulli and fix some variable handling.  For some reason my implementation requires the container to be in host mode.  My Plex was giving "401 Unauthorized" when attempt to query from docker subnets during API calls. (**Fixed now, it can be in bridge**)
-
-</details>
-
-# What is this?
-
-This will transcribe your personal media on a Plex, Emby, or Jellyfin server to create subtitles (.srt) from audio/video files with the following languages: https://github.com/McCloudS/subgen#audio-languages-supported-via-openai and transcribe or translate them into english. It can also be used as a Whisper provider in Bazarr (See below instructions). It technically has support to transcribe from a foreign langauge to itself (IE Japanese > Japanese, see [TRANSCRIBE_OR_TRANSLATE](https://github.com/McCloudS/subgen#variables)). It is currently reliant on webhooks from Jellyfin, Emby, Plex, or Tautulli. This uses stable-ts and faster-whisper which can use both Nvidia GPUs and CPUs.
-
-# Why?
-
-Honestly, I built this for me, but saw the utility in other people maybe using it.  This works well for my use case.  Since having children, I'm either deaf or wanting to have everything quiet.  We watch EVERYTHING with subtitles now, and I feel like I can't even understand the show without them.  I use Bazarr to auto-download, and gap fill with Plex's built-in capability.  This is for everything else.  Some shows just won't have subtitles available for some reason or another, or in some cases on my H265 media, they are wildly out of sync. 
-
-# What can it do?
-
-* Create .srt subtitles when a media file is added or played which triggers off of Jellyfin, Plex, or Tautulli webhooks. It can also be called via the Whisper provider inside Bazarr.
-
-# How do I set it up?
-
-## Install/Setup
-
-### Standalone/Without Docker
-
-Install python3 (Whisper supports Python 3.9-3.11), ffmpeg, and download launcher.py from this repository.  Then run it: `python3 launcher.py -u -i -s`. You need to have matching paths relative to your Plex server/folders, or use USE_PATH_MAPPING.  Paths are not needed if you are only using Bazarr. You will need the appropriate NVIDIA drivers installed minimum of CUDA Toolkit 12.3 (12.3.2 is known working): https://developer.nvidia.com/cuda-toolkit-archive
-
-Note: If you have previously had Subgen running in standalone, you may need to run `pip install --upgrade --force-reinstall faster-whisper git+https://github.com/jianfch/stable-ts.git` to force the install of the newer stable-ts package.
-
-#### Using Launcher
-
-launcher.py can launch subgen for you and automate the setup and can take the following options:
-![image](https://github.com/McCloudS/subgen/assets/64094529/081f95b2-7a09-498f-a39e-5ea66e0bc7e1)
-
-Using `-s` for Bazarr setup:
-![image](https://github.com/McCloudS/subgen/assets/64094529/ade1b886-3b99-4f80-95ac-bb28608259bb)
-
-
-
-### Docker
-
-The dockerfile is in the repo along with an example docker-compose file, and is also posted on dockerhub (mccloud/subgen). 
-
-If using Subgen without Bazarr, you MUST mount your media volumes in subgen the same way Plex (or your media server) sees them.  For example, if Plex uses "/Share/media/TV:/tv" you must have that identical volume in subgen.  
-
-`"${APPDATA}/subgen/models:/subgen/models"` is just for storage of the language models.  This isn't necessary, but you will have to redownload the models on any new image pulls if you don't use it.  
-
-`"${APPDATA}/subgen/subgen.py:/subgen/subgen.py"` If you want to control the version of subgen.py by yourself.  Launcher.py can still be used to download a newer version.
-
-If you want to use a GPU, you need to map it accordingly.  
-
-#### Unraid
-
-While Unraid doesn't have an app or template for quick install, with minor manual work, you can install it.  See [https://github.com/McCloudS/subgen/discussions/137](https://github.com/McCloudS/subgen/discussions/137) for pictures and steps.
-
-## Bazarr
-
-You only need to confiure the Whisper Provider as shown below: <br>
-![bazarr_configuration](https://wiki.bazarr.media/Additional-Configuration/images/whisper_config.png) <br>
-The Docker Endpoint is the ip address and port of your subgen container (IE http://192.168.1.111:9000) See https://wiki.bazarr.media/Additional-Configuration/Whisper-Provider/ for more info.  **127.0.0.1 WILL NOT WORK IF YOU ARE RUNNING BAZARR IN A DOCKER CONTAINER!** I recomend not enabling using the Bazarr provider with other webhooks in Subgen, or you will likely be generating duplicate subtitles. If you are using Bazarr, path mapping isn't necessary, as Bazarr sends the file over http.
-
-**The defaults of Subgen will allow it to run in Bazarr with zero configuration.  However, you will probably want to change, at a minimum, `TRANSCRIBE_DEVICE` and `WHISPER_MODEL`.**
-
-## Plex
-
-Create a webhook in Plex that will call back to your subgen address, IE: http://192.168.1.111:9000/plex see: https://support.plex.tv/articles/115002267687-webhooks/  You will also need to generate the token to use it.  Remember, Plex and Subgen need to be able to see the exact same files at the exact same paths, otherwise you need `USE_PATH_MAPPING`.
-
-## Emby
-
-All you need to do is create a webhook in Emby pointing to your subgen IE: `http://192.168.154:9000/emby`, set `Request content type` to `multipart/form-data` and configure your desired events (Usually, `New Media Added`, `Start`, and `Unpause`).  See https://github.com/McCloudS/subgen/discussions/115#discussioncomment-10569277 for screenshot examples.
-
-Emby was really nice and provides good information in their responses, so we don't need to add an API token or server url to query for more information.
-
-Remember, Emby and Subgen need to be able to see the exact same files at the exact same paths, otherwise you need `USE_PATH_MAPPING`.
-
-## Tautulli
-
-Create the webhooks in Tautulli with the following settings:
-Webhook URL: http://yourdockerip:9000/tautulli
-Webhook Method: Post
-Triggers: Whatever you want, but you'll likely want "Playback Start" and "Recently Added"
-Data: Under Playback Start, JSON Header will be:
-```json 
-{ "source":"Tautulli" }
+# Transcription dependencies (optional - required for actual transcription)
+pip install stable-ts faster-whisper
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+pip install av>=10.0.0
 ```
-Data:
-```json
-{
-            "event":"played",
-            "file":"{file}",
-            "filename":"{filename}",
-            "mediatype":"{media_type}"
-}
+
+### 2. First run (Setup Wizard)
+
+```bash
+# The setup wizard runs automatically on first start
+python backend/cli.py server
+
+# Or run setup wizard manually
+python backend/cli.py setup
 ```
-Similarly, under Recently Added, Header is: 
-```json
-{ "source":"Tautulli" }
+
+The setup wizard will guide you through:
+- **Standalone mode**: Configure library paths, scan rules, and workers
+- **Bazarr mode**: Configure as Bazarr subtitle provider (in development)
+
+### 3. Start the server
+
+```bash
+# Development (with auto-reload)
+python backend/cli.py server --reload
+
+# Production
+python backend/cli.py server --host 0.0.0.0 --port 8000 --workers 4
 ```
-Data:
-```json
-{
-            "event":"added",
-            "file":"{file}",
-            "filename":"{filename}",
-            "mediatype":"{media_type}"
-}
+
+### 4. Access the application
+
+| URL | Description |
+|-----|-------------|
+| http://localhost:8000 | Web UI (Dashboard) |
+| http://localhost:8000/docs | Swagger API Documentation |
+| http://localhost:8000/redoc | ReDoc API Documentation |
+| http://localhost:8000/health | Health Check Endpoint |
+
+---
+
+## 📋 CLI Commands
+
+```bash
+# Server
+python backend/cli.py server [options]
+  --host HOST           Host (default: 0.0.0.0)
+  --port PORT           Port (default: 8000)
+  --reload              Auto-reload for development
+  --workers N           Number of uvicorn workers (default: 1)
+  --log-level LEVEL     Log level (default: info)
+
+# Setup wizard
+python backend/cli.py setup     # Run setup wizard
+
+# Database
+python backend/cli.py db init   # Initialize database
+python backend/cli.py db reset  # Reset (WARNING: deletes all data!)
+
+# Standalone worker
+python backend/cli.py worker --type cpu
+python backend/cli.py worker --type gpu --device-id 0
+
+# Manual scan
+python backend/cli.py scan /path/to/media [--no-recursive]
 ```
-## Jellyfin

-First, you need to install the Jellyfin webhooks plugin.  Then you need to click "Add Generic Destination", name it anything you want, webhook url is your subgen info (IE http://192.168.1.154:9000/jellyfin).  Next, check Item Added, Playback Start, and Send All Properties.  Last, "Add Request Header" and add the Key: `Content-Type` Value: `application/json`<br><br>Click Save and you should be all set!
+---

-Remember, Jellyfin and Subgen need to be able to see the exact same files at the exact same paths, otherwise you need `USE_PATH_MAPPING`.
+## 🏗️ Architecture

-## Variables
+```
+┌─────────────────────────────────────────────────────────┐
+│                   FastAPI Server                         │
+│  ┌─────────────────────────────────────────────────┐   │
+│  │              REST API (45+ endpoints)            │   │
+│  │  /api/workers  | /api/jobs     | /api/settings  │   │
+│  │  /api/scanner  | /api/system   | /api/setup     │   │
+│  └─────────────────────────────────────────────────┘   │
+└──────────────────┬──────────────────────────────────────┘
+                   │
+    ┌──────────────┼──────────────┬──────────────────┐
+    │              │              │                  │
+    ▼              ▼              ▼                  ▼
+┌────────┐   ┌──────────┐   ┌─────────┐      ┌──────────┐
+│ Worker │   │  Queue   │   │ Scanner │      │ Database │
+│  Pool  │◄──┤ Manager  │◄──┤ Engine  │      │ SQLite/  │
+│ CPU/GPU│   │ Priority │   │ Rules + │      │ Postgres │
+└────────┘   │  Queue   │   │ Watcher │      └──────────┘
+             └──────────┘   └─────────┘
+```

-You can define the port via environment variables, but the endpoints are static.
+### Data Flow

-The following environment variables are available in Docker.  They will default to the values listed below.
-| Variable                  | Default Value          | Description                                                                                                                                                                                                                                                                               |
-|---------------------------|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| TRANSCRIBE_DEVICE         | 'cpu'                  | Can transcribe via gpu (Cuda only) or cpu.  Takes option of "cpu", "gpu", "cuda".                                                                                                                     |
-| WHISPER_MODEL             | 'medium'               | Can be:'tiny', 'tiny.en', 'base', 'base.en', 'small', 'small.en', 'medium', 'medium.en', 'large-v1','large-v2', 'large-v3', 'large', 'distil-large-v2', 'distil-large-v3', 'distil-large-v3.5', 'distil-medium.en', 'distil-small.en', 'large-v3-turbo'                                   |
-| CONCURRENT_TRANSCRIPTIONS | 2                      | Number of files it will transcribe in parallel                                                                                                                                                                                                                                            |
-| WHISPER_THREADS           | 4                      | number of threads to use during computation                                                                                                                                                                                                                                               |
-| MODEL_PATH                | './models'                    | This is where the WHISPER_MODEL will be stored.  This defaults to placing it where you execute the script in the folder 'models'                                                                                                                                                                              |
-| PROCESS_ADDED_MEDIA            | True                   | will gen subtitles for all media added regardless of existing external/embedded subtitles (based off of SKIP_IF_INTERNAL_SUBTITLES_LANGUAGE)                                                                                                                                                            |
-| PROCESS_MEDIA_ON_PLAY           | True                   | will gen subtitles for all played media regardless of existing external/embedded subtitles (based off of SKIP_IF_INTERNAL_SUBTITLES_LANGUAGE)                                                                                                                                                           |
-| SUBTITLE_LANGUAGE_NAME               | 'aa'                   | allows you to pick what it will name the subtitle. Instead of using EN, I'm using AA, so it doesn't mix with exiting external EN subs, and AA will populate higher on the list in Plex. This will override the Whisper detected language for a file name.                                                                                                   |
-| SKIP_IF_INTERNAL_SUBTITLES_LANGUAGE     | 'eng'                  | Will not generate a subtitle if the file has an internal sub matching the 3 letter code of this variable (See https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)                                                                                                                      |
-| WORD_LEVEL_HIGHLIGHT      | False                  | Highlights each words as it's spoken in the subtitle.  See example video @ https://github.com/jianfch/stable-ts                                                                                                                                                                           |
-| PLEX_SERVER                | 'http://plex:32400'    | This needs to be set to your local plex server address/port                                                                                                                                                                                                                               |
-| PLEX_TOKEN                 | 'token here'           | This needs to be set to your plex token found by https://support.plex.tv/articles/204059436-finding-an-authentication-token-x-plex-token/                                                                                                                                                 |
-| JELLYFIN_SERVER            | 'http://jellyfin:8096' | Set to your Jellyfin server address/port                                                                                                                                                                                                                                                  |
-| JELLYFIN_TOKEN             | 'token here'           | Generate a token inside the Jellyfin interface                                                                                                                                                                                                                                            |
-| WEBHOOK_PORT               | 9000                   | Change this if you need a different port for your webhook                                                                                                                                                                                                                                 |
-| USE_PATH_MAPPING          | False                  | Similar to sonarr and radarr path mapping, this will attempt to replace paths on file systems that don't have identical paths.  Currently only support for one path replacement. Examples below.                                                                                          |
-| PATH_MAPPING_FROM         | '/tv'                  | This is the path of my media relative to my Plex server                                                                                                                                                                                                                                   |
-| PATH_MAPPING_TO           | '/Volumes/TV'          | This is the path of that same folder relative to my Mac Mini that will run the script                                                                                                                                                                                                     |
-| TRANSCRIBE_FOLDERS        | ''                     | Takes a pipe '\|' separated list (For example: /tv\|/movies\|/familyvideos) and iterates through and adds those files to be queued for subtitle generation if they don't have internal subtitles                                                                                              |
-| TRANSCRIBE_OR_TRANSLATE   | 'transcribe'            | Takes either 'transcribe' or 'translate'.  Transcribe will transcribe the audio in the same language as the input. Translate will transcribe and translate into English. | 
-| COMPUTE_TYPE | 'auto' | Set compute-type using the following information: https://github.com/OpenNMT/CTranslate2/blob/master/docs/quantization.md |
-| DEBUG                     | True                  | Provides some debug data that can be helpful to troubleshoot path mapping and other issues. Fun fact, if this is set to true, any modifications to the script will auto-reload it (if it isn't actively transcoding).  Useful to make small tweaks without re-downloading the whole file. |
-| FORCE_DETECTED_LANGUAGE_TO | '' | This is to force the model to a language instead of the detected one, takes a 2 letter language code.  For example, your audio is French but keeps detecting as English, you would set it to 'fr' |
-| CLEAR_VRAM_ON_COMPLETE | True | This will delete the model and do garbage collection when queue is empty.  Good if you need to use the VRAM for something else. |
-| UPDATE | False | Will pull latest subgen.py from the repository if True.  False will use the original subgen.py built into the Docker image.  Standalone users can use this with launcher.py to get updates. |
-| APPEND | False | Will add the following at the end of a subtitle: "Transcribed by whisperAI with faster-whisper ({whisper_model}) on {datetime.now()}"
-| MONITOR | False | Will monitor `TRANSCRIBE_FOLDERS` for real-time changes to see if we need to generate subtitles |
-| USE_MODEL_PROMPT | False | When set to `True`, will use the default prompt stored in greetings_translations "Hello, welcome to my lecture." to try and force the use of punctuation in transcriptions that don't. Automatic `CUSTOM_MODEL_PROMPT` will only work with ASR, but can still be set manually like so: `USE_MODEL_PROMPT=True and CUSTOM_MODEL_PROMPT=Hello, welcome to my lecture.`  |
-| CUSTOM_MODEL_PROMPT | '' | If `USE_MODEL_PROMPT` is `True`, you can override the default prompt (See: https://medium.com/axinc-ai/prompt-engineering-in-whisper-6bb18003562d for great examples). |
-| LRC_FOR_AUDIO_FILES | True | Will generate LRC (instead of SRT) files for filetypes: '.mp3', '.flac', '.wav', '.alac', '.ape', '.ogg', '.wma', '.m4a', '.m4b', '.aac', '.aiff' | 
-| CUSTOM_REGROUP | 'cm_sl=84_sl=42++++++1' | Attempts to regroup some of the segments to make a cleaner looking subtitle.  See https://github.com/McCloudS/subgen/issues/68 for discussion. Set to blank if you want to use Stable-TS default regroups algorithm of `cm_sp=,* /，_sg=.5_mg=.3+3_sp=.* /。/?/？` |
-| DETECT_LANGUAGE_LENGTH | 30 | Detect language on the first x seconds of the audio. |
-| SKIP_IF_EXTERNAL_SUBTITLES_EXIST | False | Skip subtitle generation if an external subtitle with the same language code as NAMESUBLANG is present. Used for the case of not regenerating subtitles if I already have `Movie (2002).NAMESUBLANG.srt` from a non-subgen source. |
-| SUBGEN_KWARGS | '{}' | Takes a kwargs python dictionary of options you would like to add/override.  For advanced users.  An example would be `{'vad': True, 'prompt_reset_on_temperature': 0.35}` |
-| SKIP_SUBTITLE_LANGUAGES | '' | Takes a pipe separated `\|` list of 3 letter language codes to not generate subtitles for example 'eng\|deu'|
-| SUBTITLE_LANGUAGE_NAMING_TYPE | 'ISO_639_2_B' | The type of naming format desired, such as 'ISO_639_1', 'ISO_639_2_T', 'ISO_639_2_B', 'NAME', or 'NATIVE', for example: ("es", "spa", "spa", "Spanish", "Español") |
-| SKIP_SUBTITLE_LANGUAGES | '' | Takes a pipe separated `\|` list of 3 letter language codes to skip if the file has audio in that language.  This could be used to skip generating subtitles for a language you don't want, like, I speak English, don't generate English subtitles (for example: 'eng\|deu')|
-| PREFERRED_AUDIO_LANGUAGE | 'eng' | If there are multiple audio tracks in a file, it will prefer this setting |
-| SKIP_IF_TARGET_SUBTITLES_EXIST | True | Skips generation of subtitle if a file matches our desired language already. |
-| DETECT_LANGUAGE_OFFSET | 0 | Allows you to shift when to run detect_language, geared towards avoiding introductions or songs. |
-| PREFERRED_AUDIO_LANGUAGES | 'eng' | Pipe separated list |
-| SKIP_IF_AUDIO_TRACK_IS | '' | Takes a pipe separated list of ISO 639-2 languages. Skips generation of subtitle if the file has the audio file listed. |
-| SKIP_ONLY_SUBGEN_SUBTITLES | False | Skips generation of subtitles if the file has "subgen" somewhere in the same |
-| SKIP_UNKNOWN_LANGUAGE | False | Skips generation if the file has an unknown language |
-| SKIP_IF_NO_LANGUAGE_BUT_SUBTITLES_EXIST | False | Skips generation if file doesn't have an audio stream marked with a language |
-| SHOULD_WHISPER_DETECT_AUDIO_LANGUAGE | False | Should Whisper try to detect the language if there is no audio language specified via force langauge |
-| PLEX_QUEUE_NEXT_EPISODE | False | Will queue the next Plex series episode for subtitle generation if subgen is triggered. |
-| PLEX_QUEUE_SEASON | False | Will queue the rest of the Plex season for subtitle generation if subgen is triggered. |
-| PLEX_QUEUE_SERIES | False | Will queue the whole Plex series for subtitle generation if subgen is triggered. |
-| SHOW_IN_SUBNAME_SUBGEN | True | Adds subgen to the subtitle file name. |
-| SHOW_IN_SUBNAME_MODEL | True | Adds Whisper model name to the subtitle file name. |
+1. **LibraryScanner** detects files (manual/scheduled/watcher)
+2. **FileAnalyzer** analyzes with ffprobe (audio tracks, subtitles)
+3. **Rules Engine** evaluates against configurable ScanRules
+4. **QueueManager** adds job to persistent queue (with deduplication)
+5. **Worker** processes with WhisperTranscriber
+6. **Output**: Generates `.eng.srt` (transcription) or `.{lang}.srt` (translation)

-### Images:
-`mccloud/subgen:latest` is GPU or CPU <br>
-`mccloud/subgen:cpu` is for CPU only (slightly smaller image)
-<br><br>
+---

-# What are the limitations/problems?
+## 🖥️ Web UI

-* I made it and know nothing about formal deployment for python coding.  
-* It's using trained AI models to transcribe, so it WILL mess up
+The Web UI includes 6 complete views:

-# What's next?  
+| View | Description |
+|------|-------------|
+| **Dashboard** | System overview, resource monitoring (CPU/RAM/GPU), recent jobs |
+| **Queue** | Job management with filters, pagination, retry/cancel actions |
+| **Scanner** | Scanner control, scheduler configuration, manual scan trigger |
+| **Rules** | Scan rules CRUD with create/edit modal |
+| **Workers** | Worker pool management, add/remove workers dynamically |
+| **Settings** | Database-backed settings organized by category |

-Fix documentation and make it prettier!
+---

-# Audio Languages Supported (via OpenAI)
+## 🎛️ Configuration

-Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
+### Database-backed Settings

-# Known Issues
+All configuration is stored in the database and manageable via:
+- **Setup Wizard** (first run)
+- **Settings page** in Web UI
+- **Settings API** (`/api/settings`)

-At this time, if you have high CPU usage when not actively transcribing on the CPU only docker, try the GPU one.
+### Settings Categories

-# Additional reading:
+| Category | Settings |
+|----------|----------|
+| **General** | Operation mode, library paths, log level |
+| **Workers** | CPU/GPU worker counts, auto-start, health check interval |
+| **Transcription** | Whisper model, compute type, skip existing files |
+| **Scanner** | Enable/disable, schedule interval, file watcher |
+| **Bazarr** | Provider mode (in development) |

-* https://github.com/openai/whisper (Original OpenAI project)
-* https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes (2 letter subtitle codes)
+### Environment Variables

-# Credits:  
-* Whisper.cpp (https://github.com/ggerganov/whisper.cpp) for original implementation
-* Google
-* ffmpeg
-* https://github.com/jianfch/stable-ts
-* https://github.com/guillaumekln/faster-whisper
-* Whipser ASR Webservice (https://github.com/ahmetoner/whisper-asr-webservice) for how to implement Bazarr webhooks.
+Only `DATABASE_URL` is required in `.env`:
+
+```bash
+# SQLite (default)
+DATABASE_URL=sqlite:///./transcriptarr.db
+
+# PostgreSQL (production)
+DATABASE_URL=postgresql://user:pass@localhost/transcriptarr
+```
+
+---
+
+## 📚 Documentation
+
+| Document | Description |
+|----------|-------------|
+| [docs/API.md](docs/API.md) | Complete REST API documentation (45+ endpoints) |
+| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Backend architecture and components |
+| [docs/FRONTEND.md](docs/FRONTEND.md) | Frontend structure and components |
+| [docs/CONFIGURATION.md](docs/CONFIGURATION.md) | Configuration system and settings |
+
+---
+
+## 🐳 Docker
+
+```bash
+# CPU only
+docker build -t transcriptorio:cpu -f Dockerfile.cpu .
+
+# GPU (NVIDIA CUDA)
+docker build -t transcriptorio:gpu -f Dockerfile .
+
+# Run
+docker run -d \
+  -p 8000:8000 \
+  -v /path/to/media:/media \
+  -v /path/to/data:/app/data \
+  --gpus all \
+  transcriptorio:gpu
+```
+
+---
+
+## 📊 Project Status
+
+| Component | Status | Progress |
+|-----------|--------|----------|
+| Core Backend | ✅ Complete | 100% |
+| REST API (45+ endpoints) | ✅ Complete | 100% |
+| Worker System | ✅ Complete | 100% |
+| Library Scanner | ✅ Complete | 100% |
+| Web UI (6 views) | ✅ Complete | 100% |
+| Settings System | ✅ Complete | 100% |
+| Setup Wizard | ✅ Complete | 100% |
+| Bazarr Provider | ⏳ In Development | 30% |
+| Testing Suite | ⏳ Pending | 0% |
+| Docker | ⏳ Pending | 0% |
+
+---
+
+## 🤝 Contributing
+
+Contributions are welcome!
+---
+
+## 📝 Credits
+
+Based on [Subgen](https://github.com/McCloudS/subgen) by McCloudS.
+
+Architecture redesigned with:
+- FastAPI for REST APIs
+- SQLAlchemy for persistence
+- Multiprocessing for workers
+- Whisper (stable-ts / faster-whisper) for transcription
+- Vue 3 + Pinia for frontend
+
+---
+
+## 📄 License
+
+MIT License - See [LICENSE](LICENSE) for details.
--- a/backend/README.md
+++ b/backend/README.md
@@ -1,185 +0,0 @@
-# TranscriptorIO Backend
-
-This is the redesigned backend for TranscriptorIO, a complete fork of SubGen with modern asynchronous architecture.
-
-## 🎯 Goal
-
-Replace SubGen's synchronous non-persistent system with a modern Tdarr-inspired architecture:
- ✅ Persistent queue (SQLite/PostgreSQL/MariaDB)
- ✅ Asynchronous processing
- ✅ Job prioritization
- ✅ Complete state visibility
- ✅ No Bazarr timeouts
-
-## 📁 Structure
-
-```
-backend/
-├── core/
-│   ├── database.py       # Multi-backend database management
-│   ├── models.py         # SQLAlchemy models (Job, etc.)
-│   ├── queue_manager.py  # Asynchronous persistent queue
-│   └── __init__.py
-├── api/                  # (coming soon) FastAPI endpoints
-├── config.py            # Centralized configuration with Pydantic
-└── README.md            # This file
-```
-
-## 🚀 Setup
-
-### 1. Install dependencies
-
-```bash
-pip install -r requirements.txt
-```
-
-### 2. Configure .env
-
-Copy `.env.example` to `.env` and adjust as needed:
-
-```bash
-cp .env.example .env
-```
-
-#### Database Options
-
-**SQLite (default)**:
-```env
-DATABASE_URL=sqlite:///./transcriptarr.db
-```
-
-**PostgreSQL**:
-```bash
-pip install psycopg2-binary
-```
-```env
-DATABASE_URL=postgresql://user:password@localhost:5432/transcriptarr
-```
-
-**MariaDB/MySQL**:
-```bash
-pip install pymysql
-```
-```env
-DATABASE_URL=mariadb+pymysql://user:password@localhost:3306/transcriptarr
-```
-
-### 3. Choose operation mode
-
-**Standalone Mode** (automatically scans your library):
-```env
-TRANSCRIPTARR_MODE=standalone
-LIBRARY_PATHS=/media/anime|/media/movies
-AUTO_SCAN_ENABLED=True
-SCAN_INTERVAL_MINUTES=30
-```
-
-**Provider Mode** (receives jobs from Bazarr):
-```env
-TRANSCRIPTARR_MODE=provider
-BAZARR_URL=http://bazarr:6767
-BAZARR_API_KEY=your_api_key
-```
-
-**Hybrid Mode** (both simultaneously):
-```env
-TRANSCRIPTARR_MODE=standalone,provider
-```
-
-## 🧪 Testing
-
-Run the test script to verify everything works:
-
-```bash
-python test_backend.py
-```
-
-This will verify:
- ✓ Configuration loading
- ✓ Database connection
- ✓ Table creation
- ✓ Queue operations (add, get, deduplicate)
-
-## 📊 Implemented Components
-
-### config.py
- Centralized configuration with Pydantic
- Automatic environment variable validation
- Multi-backend database support
- Operation mode configuration
-
-### database.py
- Connection management with SQLAlchemy
- Support for SQLite, PostgreSQL, MariaDB
- Backend-specific optimizations
-  - SQLite: WAL mode, optimized cache
-  - PostgreSQL: connection pooling, pre-ping
-  - MariaDB: utf8mb4 charset, pooling
- Health checks and statistics
-
-### models.py
- Complete `Job` model with:
-  - States: queued, processing, completed, failed, cancelled
-  - Stages: pending, detecting_language, transcribing, translating, etc.
-  - Quality presets: fast, balanced, best
-  - Progress tracking (0-100%)
-  - Complete timestamps
-  - Retry logic
-  - Worker assignment
- Optimized indexes for common queries
-
-### queue_manager.py
- Thread-safe persistent queue
- Job prioritization
- Duplicate detection
- Automatic retry for failed jobs
- Real-time statistics
- Automatic cleanup of old jobs
-
-## 🔄 Comparison with SubGen
-
-| Feature | SubGen | TranscriptorIO |
-|---------|--------|----------------|
-| Queue | In-memory (lost on restart) | **Persistent in DB** |
-| Processing | Synchronous (blocks threads) | **Asynchronous** |
-| Prioritization | No | **Yes (configurable)** |
-| Visibility | No progress/ETA | **Progress + real-time ETA** |
-| Deduplication | Basic (memory only) | **Persistent + intelligent** |
-| Retries | No | **Automatic with limit** |
-| Database | No | **SQLite/PostgreSQL/MariaDB** |
-| Bazarr Timeouts | Yes (>5min = 24h throttle) | **No (async)** |
-
-## 📝 Next Steps
-
-1. **Worker Pool** - Asynchronous worker system
-2. **REST API** - FastAPI endpoints for management
-3. **WebSocket** - Real-time updates
-4. **Transcriber** - Whisper wrapper with progress callbacks
-5. **Bazarr Provider** - Improved async provider
-6. **Standalone Scanner** - Automatic library scanning
-
-## 🐛 Troubleshooting
-
-### Error: "No module named 'backend'"
-
-Make sure to run scripts from the project root:
-```bash
-cd /home/dasemu/Hacking/Transcriptarr
-python test_backend.py
-```
-
-### Error: Database locked (SQLite)
-
-SQLite is configured with WAL mode for better concurrency. If you still have issues, consider using PostgreSQL for production.
-
-### Error: pydantic.errors.ConfigError
-
-Verify that all required variables are in your `.env`:
-```bash
-cp .env.example .env
-# Edit .env with your values
-```
-
-## 📚 Documentation
-
-See `CLAUDE.md` for complete architecture and project roadmap.
--- a/docs/API.md
+++ b/docs/API.md
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,613 @@
+# TranscriptorIO Backend Architecture
+
+Technical documentation of the backend architecture, components, and data flow.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Directory Structure](#directory-structure)
+- [Core Components](#core-components)
+- [Data Flow](#data-flow)
+- [Database Schema](#database-schema)
+- [Transcription vs Translation](#transcription-vs-translation)
+- [Worker Architecture](#worker-architecture)
+- [Queue System](#queue-system)
+- [Scanner System](#scanner-system)
+- [Settings System](#settings-system)
+- [Graceful Degradation](#graceful-degradation)
+- [Thread Safety](#thread-safety)
+- [Important Patterns](#important-patterns)
+
+---
+
+## Overview
+
+TranscriptorIO is built with a modular architecture consisting of:
+
+- **FastAPI Server**: REST API with 45+ endpoints
+- **Worker Pool**: Multiprocessing-based transcription workers (CPU/GPU)
+- **Queue Manager**: Persistent job queue with priority support
+- **Library Scanner**: Rule-based file scanning with scheduler and watcher
+- **Settings Service**: Database-backed configuration system
+
+```
+                    ┌─────────────────────────────────────────────────────────┐
+                    │                   FastAPI Server                         │
+                    │  ┌─────────────────────────────────────────────────┐   │
+                    │  │              REST API (45+ endpoints)            │   │
+                    │  │  /api/workers  | /api/jobs     | /api/settings  │   │
+                    │  │  /api/scanner  | /api/system   | /api/setup     │   │
+                    │  └─────────────────────────────────────────────────┘   │
+                    └──────────────────┬──────────────────────────────────────┘
+                                       │
+                        ┌──────────────┼──────────────┬──────────────────┐
+                        │              │              │                  │
+                        ▼              ▼              ▼                  ▼
+                   ┌────────┐   ┌──────────┐   ┌─────────┐      ┌──────────┐
+                   │ Worker │   │  Queue   │   │ Scanner │      │ Database │
+                   │  Pool  │◄──┤ Manager  │◄──┤ Engine  │      │ SQLite/  │
+                   │ CPU/GPU│   │ Priority │   │ Rules + │      │ Postgres │
+                   └────────┘   │  Queue   │   │ Watcher │      └──────────┘
+                                └──────────┘   └─────────┘
+```
+
+---
+
+## Directory Structure
+
+```
+backend/
+├── app.py                      # FastAPI application + lifespan
+├── cli.py                      # CLI commands (server, db, worker, scan, setup)
+├── config.py                   # Pydantic Settings (from .env)
+├── setup_wizard.py             # Interactive first-run setup
+│
+├── core/
+│   ├── database.py             # SQLAlchemy setup + session management
+│   ├── models.py               # Job model + enums
+│   ├── language_code.py        # ISO 639 language code utilities
+│   ├── settings_model.py       # SystemSettings model (database-backed)
+│   ├── settings_service.py     # Settings service with caching
+│   ├── system_monitor.py       # CPU/RAM/GPU/VRAM monitoring
+│   ├── queue_manager.py        # Persistent queue with priority
+│   ├── worker.py               # Individual worker (Process)
+│   └── worker_pool.py          # Worker pool orchestrator
+│
+├── transcription/
+│   ├── __init__.py             # Exports + WHISPER_AVAILABLE flag
+│   ├── transcriber.py          # WhisperTranscriber wrapper
+│   ├── translator.py           # Google Translate integration
+│   └── audio_utils.py          # ffmpeg/ffprobe utilities
+│
+├── scanning/
+│   ├── __init__.py             # Exports (NO library_scanner import!)
+│   ├── models.py               # ScanRule model
+│   ├── file_analyzer.py        # ffprobe file analysis
+│   ├── language_detector.py    # Audio language detection
+│   ├── detected_languages.py   # Language mappings
+│   └── library_scanner.py      # Scanner + scheduler + watcher
+│
+└── api/
+    ├── __init__.py             # Router exports
+    ├── workers.py              # Worker management endpoints
+    ├── jobs.py                 # Job queue endpoints
+    ├── scan_rules.py           # Scan rules CRUD
+    ├── scanner.py              # Scanner control endpoints
+    ├── settings.py             # Settings CRUD endpoints
+    ├── system.py               # System resources endpoints
+    ├── filesystem.py           # Filesystem browser endpoints
+    └── setup_wizard.py         # Setup wizard endpoints
+```
+
+---
+
+## Core Components
+
+### 1. WorkerPool (`core/worker_pool.py`)
+
+Orchestrates CPU/GPU workers as separate processes.
+
+**Key Features:**
+- Dynamic add/remove workers at runtime
+- Health monitoring with auto-restart
+- Thread-safe multiprocessing
+- Each worker is an isolated Process
+
+```python
+from backend.core.worker_pool import worker_pool
+from backend.core.worker import WorkerType
+
+# Add GPU worker on device 0
+worker_id = worker_pool.add_worker(WorkerType.GPU, device_id=0)
+
+# Add CPU worker
+worker_id = worker_pool.add_worker(WorkerType.CPU)
+
+# Get pool stats
+stats = worker_pool.get_pool_stats()
+```
+
+### 2. QueueManager (`core/queue_manager.py`)
+
+Persistent SQLite/PostgreSQL queue with priority support.
+
+**Key Features:**
+- Job deduplication (no duplicate `file_path`)
+- Row-level locking with `skip_locked=True`
+- Priority-based ordering (higher first)
+- FIFO within same priority (by `created_at`)
+- Auto-retry failed jobs
+
+```python
+from backend.core.queue_manager import queue_manager
+from backend.core.models import QualityPreset
+
+job = queue_manager.add_job(
+    file_path="/media/anime.mkv",
+    file_name="anime.mkv",
+    source_lang="jpn",
+    target_lang="spa",
+    quality_preset=QualityPreset.FAST,
+    priority=5
+)
+```
+
+### 3. LibraryScanner (`scanning/library_scanner.py`)
+
+Rule-based file scanning system.
+
+**Three Scan Modes:**
+- **Manual**: One-time scan via API or CLI
+- **Scheduled**: Periodic scanning with APScheduler
+- **Real-time**: File watcher with watchdog library
+
+```python
+from backend.scanning.library_scanner import library_scanner
+
+# Manual scan
+result = library_scanner.scan_paths(["/media/anime"], recursive=True)
+
+# Start scheduler (every 6 hours)
+library_scanner.start_scheduler(interval_minutes=360)
+
+# Start file watcher
+library_scanner.start_file_watcher(paths=["/media/anime"], recursive=True)
+```
+
+### 4. WhisperTranscriber (`transcription/transcriber.py`)
+
+Wrapper for stable-whisper and faster-whisper.
+
+**Key Features:**
+- GPU/CPU support with auto-device detection
+- VRAM management and cleanup
+- Graceful degradation (works without Whisper installed)
+
+```python
+from backend.transcription.transcriber import WhisperTranscriber
+
+transcriber = WhisperTranscriber(
+    model_name="large-v3",
+    device="cuda",
+    compute_type="float16"
+)
+
+result = transcriber.transcribe_file(
+    file_path="/media/episode.mkv",
+    language="jpn",
+    task="translate"  # translate to English
+)
+
+result.to_srt("episode.eng.srt")
+```
+
+### 5. SettingsService (`core/settings_service.py`)
+
+Database-backed configuration with caching.
+
+```python
+from backend.core.settings_service import settings_service
+
+# Get setting
+value = settings_service.get("worker_cpu_count", default=1)
+
+# Set setting
+settings_service.set("worker_cpu_count", "2")
+
+# Bulk update
+settings_service.bulk_update({
+    "worker_cpu_count": "2",
+    "scanner_enabled": "true"
+})
+```
+
+---
+
+## Data Flow
+
+```
+1. LibraryScanner detects file (manual/scheduled/watcher)
+   ↓
+2. FileAnalyzer analyzes with ffprobe
+   - Audio tracks (codec, language, channels)
+   - Embedded subtitles
+   - External .srt files
+   - Duration, video info
+   ↓
+3. Rules Engine evaluates against ScanRules (priority order)
+   - Checks all conditions (audio language, missing subs, etc.)
+   - First matching rule wins
+   ↓
+4. If match → QueueManager.add_job()
+   - Deduplication check (no duplicate file_path)
+   - Assigns priority based on rule
+   ↓
+5. Worker pulls job from queue
+   - Uses with_for_update(skip_locked=True)
+   - FIFO within same priority
+   ↓
+6. WhisperTranscriber processes with model
+   - Stage 1: Audio → English (Whisper translate)
+   - Stage 2: English → Target (Google Translate, if needed)
+   ↓
+7. Generate output SRT file(s)
+   - .eng.srt (always)
+   - .{target}.srt (if translate mode)
+   ↓
+8. Job marked completed ✓
+```
+
+---
+
+## Database Schema
+
+### Job Table (`jobs`)
+
+```sql
+id                      VARCHAR PRIMARY KEY
+file_path               VARCHAR UNIQUE      -- Ensures no duplicates
+file_name               VARCHAR
+status                  VARCHAR             -- queued/processing/completed/failed/cancelled
+priority                INTEGER
+source_lang             VARCHAR
+target_lang             VARCHAR
+quality_preset          VARCHAR             -- fast/balanced/best
+transcribe_or_translate VARCHAR             -- transcribe/translate
+progress                FLOAT
+current_stage           VARCHAR
+eta_seconds             INTEGER
+created_at              DATETIME
+started_at              DATETIME
+completed_at            DATETIME
+output_path             VARCHAR
+srt_content             TEXT
+segments_count          INTEGER
+error                   TEXT
+retry_count             INTEGER
+max_retries             INTEGER
+worker_id               VARCHAR
+vram_used_mb            INTEGER
+processing_time_seconds FLOAT
+```
+
+### ScanRule Table (`scan_rules`)
+
+```sql
+id                          INTEGER PRIMARY KEY
+name                        VARCHAR UNIQUE
+enabled                     BOOLEAN
+priority                    INTEGER         -- Higher = evaluated first
+
+-- Conditions (all must match):
+audio_language_is           VARCHAR         -- ISO 639-2
+audio_language_not          VARCHAR         -- Comma-separated
+audio_track_count_min       INTEGER
+has_embedded_subtitle_lang  VARCHAR
+missing_embedded_subtitle_lang VARCHAR
+missing_external_subtitle_lang VARCHAR
+file_extension              VARCHAR         -- Comma-separated
+
+-- Action:
+action_type                 VARCHAR         -- transcribe/translate
+target_language             VARCHAR
+quality_preset              VARCHAR
+job_priority                INTEGER
+
+created_at                  DATETIME
+updated_at                  DATETIME
+```
+
+### SystemSettings Table (`system_settings`)
+
+```sql
+id          INTEGER PRIMARY KEY
+key         VARCHAR UNIQUE
+value       TEXT
+description TEXT
+category    VARCHAR             -- general/workers/transcription/scanner/bazarr
+value_type  VARCHAR             -- string/integer/boolean/list
+created_at  DATETIME
+updated_at  DATETIME
+```
+
+---
+
+## Transcription vs Translation
+
+### Understanding the Two Modes
+
+**Mode 1: `transcribe`** (Audio → English subtitles)
+```
+Audio (any language) → Whisper (task='translate') → English SRT
+Example: Japanese audio → anime.eng.srt
+```
+
+**Mode 2: `translate`** (Audio → English → Target language)
+```
+Audio (any language) → Whisper (task='translate') → English SRT
+                    → Google Translate → Target language SRT
+Example: Japanese audio → anime.eng.srt + anime.spa.srt
+```
+
+### Why Two Stages?
+
+**Whisper Limitation**: Whisper can only translate TO English, not between other languages.
+
+**Solution**: Two-stage process:
+1. **Stage 1 (Always)**: Whisper converts audio to English using `task='translate'`
+2. **Stage 2 (Only for translate mode)**: Google Translate converts English to target language
+
+### Output Files
+
+| Mode | Target | Output Files |
+|------|--------|--------------|
+| transcribe | spa | `.eng.srt` only |
+| translate | spa | `.eng.srt` + `.spa.srt` |
+| translate | fra | `.eng.srt` + `.fra.srt` |
+
+---
+
+## Worker Architecture
+
+### Worker Types
+
+| Type | Description | Device |
+|------|-------------|--------|
+| CPU | Uses CPU for inference | None |
+| GPU | Uses NVIDIA GPU | cuda:N |
+
+### Worker Lifecycle
+
+```
+                    ┌─────────────┐
+                    │   CREATED   │
+                    └──────┬──────┘
+                           │ start()
+                           ▼
+                    ┌─────────────┐
+        ┌──────────│    IDLE     │◄─────────┐
+        │          └──────┬──────┘          │
+        │                 │ get_job()       │ job_done()
+        │                 ▼                 │
+        │          ┌─────────────┐          │
+        │          │    BUSY     │──────────┘
+        │          └──────┬──────┘
+        │                 │ error
+        │                 ▼
+        │          ┌─────────────┐
+        └──────────│    ERROR    │
+                   └─────────────┘
+```
+
+### Process Isolation
+
+Each worker runs in a separate Python process:
+- Memory isolation (VRAM per GPU worker)
+- Crash isolation (one worker crash doesn't affect others)
+- Independent model loading
+
+---
+
+## Queue System
+
+### Priority System
+
+```python
+# Priority values
+BAZARR_REQUEST = base_priority + 10    # Highest (external request)
+MANUAL_REQUEST = base_priority + 5     # High (user-initiated)
+AUTO_SCAN      = base_priority         # Normal (scanner-generated)
+```
+
+### Job Deduplication
+
+Jobs are deduplicated by `file_path`:
+- If job exists with same `file_path`, new job is rejected
+- Returns `None` from `add_job()`
+- Prevents duplicate processing
+
+### Concurrency Safety
+
+```python
+# Row-level locking prevents race conditions
+job = session.query(Job).filter(
+    Job.status == JobStatus.QUEUED
+).with_for_update(skip_locked=True).first()
+```
+
+---
+
+## Scanner System
+
+### Scan Rule Evaluation
+
+Rules are evaluated in priority order (highest first):
+
+```python
+# Pseudo-code for rule matching
+for rule in rules.order_by(priority.desc()):
+    if rule.enabled and matches_all_conditions(file, rule):
+        create_job(file, rule.action)
+        break  # First match wins
+```
+
+### Conditions
+
+All conditions must match (AND logic):
+
+| Condition | Match If |
+|-----------|----------|
+| audio_language_is | Primary audio track language equals |
+| audio_language_not | Primary audio track language NOT in list |
+| audio_track_count_min | Number of audio tracks >= value |
+| has_embedded_subtitle_lang | Has embedded subtitle in language |
+| missing_embedded_subtitle_lang | Does NOT have embedded subtitle |
+| missing_external_subtitle_lang | Does NOT have external .srt file |
+| file_extension | File extension in comma-separated list |
+
+---
+
+## Settings System
+
+### Categories
+
+| Category | Settings |
+|----------|----------|
+| general | operation_mode, library_paths, log_level |
+| workers | cpu_count, gpu_count, auto_start, healthcheck_interval |
+| transcription | whisper_model, compute_type, vram_management |
+| scanner | enabled, schedule_interval, watcher_enabled |
+| bazarr | provider_enabled, api_key |
+
+### Caching
+
+Settings service implements caching:
+- Cache invalidated on write
+- Thread-safe access
+- Lazy loading from database
+
+---
+
+## Graceful Degradation
+
+The system can run WITHOUT Whisper/torch/PyAV installed:
+
+```python
+# Pattern used everywhere
+try:
+    import stable_whisper
+    WHISPER_AVAILABLE = True
+except ImportError:
+    stable_whisper = None
+    WHISPER_AVAILABLE = False
+
+# Later in code
+if not WHISPER_AVAILABLE:
+    raise RuntimeError("Install with: pip install stable-ts faster-whisper")
+```
+
+**What works without Whisper:**
+- Backend server starts normally
+- All APIs work fully
+- Frontend development
+- Scanner and rules management
+- Job queue (jobs just won't be processed)
+
+**What doesn't work:**
+- Actual transcription (throws RuntimeError)
+
+---
+
+## Thread Safety
+
+### Database Sessions
+
+Always use context managers:
+
+```python
+with database.get_session() as session:
+    # Session is automatically committed on success
+    # Rolled back on exception
+    job = session.query(Job).filter(...).first()
+```
+
+### Worker Pool
+
+- Each worker is a separate Process (multiprocessing)
+- Communication via shared memory (Manager)
+- No GIL contention between workers
+
+### Queue Manager
+
+- Uses SQLAlchemy row locking
+- `skip_locked=True` prevents deadlocks
+- Transactions are short-lived
+
+---
+
+## Important Patterns
+
+### Circular Import Resolution
+
+**Critical**: `backend/scanning/__init__.py` MUST NOT import `library_scanner`:
+
+```python
+# backend/scanning/__init__.py
+from backend.scanning.models import ScanRule
+from backend.scanning.file_analyzer import FileAnalyzer, FileAnalysis
+# DO NOT import library_scanner here!
+```
+
+**Why?**
+```
+library_scanner → database → models → scanning.models → database (circular!)
+```
+
+**Solution**: Import `library_scanner` locally where needed:
+```python
+def some_function():
+    from backend.scanning.library_scanner import library_scanner
+    library_scanner.scan_paths(...)
+```
+
+### Optional Imports
+
+```python
+try:
+    import pynvml
+    NVML_AVAILABLE = True
+except ImportError:
+    pynvml = None
+    NVML_AVAILABLE = False
+```
+
+### Database Session Pattern
+
+```python
+from backend.core.database import database
+
+with database.get_session() as session:
+    # All operations within session context
+    job = session.query(Job).filter(...).first()
+    job.status = JobStatus.PROCESSING
+    # Commit happens automatically
+```
+
+### API Response Pattern
+
+```python
+from pydantic import BaseModel
+
+class JobResponse(BaseModel):
+    id: str
+    status: str
+    # ...
+
+@router.get("/{job_id}", response_model=JobResponse)
+async def get_job(job_id: str):
+    with database.get_session() as session:
+        job = session.query(Job).filter(Job.id == job_id).first()
+        if not job:
+            raise HTTPException(status_code=404, detail="Not found")
+        return JobResponse(**job.to_dict())
+```
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -0,0 +1,402 @@
+# TranscriptorIO Configuration
+
+Complete documentation for the configuration system.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Configuration Methods](#configuration-methods)
+- [Settings Categories](#settings-categories)
+- [All Settings Reference](#all-settings-reference)
+- [Environment Variables](#environment-variables)
+- [Setup Wizard](#setup-wizard)
+- [API Configuration](#api-configuration)
+
+---
+
+## Overview
+
+TranscriptorIO uses a **database-backed configuration system**. All settings are stored in the `system_settings` table and can be managed through:
+
+1. **Setup Wizard** (first run)
+2. **Web UI** (Settings page)
+3. **REST API** (`/api/settings`)
+4. **CLI** (for advanced users)
+
+This approach provides:
+- Persistent configuration across restarts
+- Runtime configuration changes without restart
+- Category-based organization
+- Type validation and parsing
+
+---
+
+## Configuration Methods
+
+### 1. Setup Wizard (Recommended for First Run)
+
+```bash
+# Runs automatically on first server start
+python backend/cli.py server
+
+# Or run manually anytime
+python backend/cli.py setup
+```
+
+The wizard guides you through:
+- **Operation mode selection** (Standalone or Bazarr provider)
+- **Library paths configuration**
+- **Initial scan rules**
+- **Worker configuration** (CPU/GPU counts)
+- **Scanner schedule**
+
+### 2. Web UI (Recommended for Daily Use)
+
+Navigate to **Settings** in the web interface (`http://localhost:8000/settings`).
+
+Features:
+- Settings grouped by category tabs
+- Descriptions for each setting
+- Change detection (warns about unsaved changes)
+- Bulk save functionality
+
+### 3. REST API (For Automation/Integration)
+
+```bash
+# Get all settings
+curl http://localhost:8000/api/settings
+
+# Get settings by category
+curl http://localhost:8000/api/settings?category=workers
+
+# Update a setting
+curl -X PUT http://localhost:8000/api/settings/worker_cpu_count \
+  -H "Content-Type: application/json" \
+  -d '{"value": "2"}'
+
+# Bulk update
+curl -X POST http://localhost:8000/api/settings/bulk-update \
+  -H "Content-Type: application/json" \
+  -d '{
+    "settings": {
+      "worker_cpu_count": "2",
+      "worker_gpu_count": "1"
+    }
+  }'
+```
+
+---
+
+## Settings Categories
+
+| Category | Description |
+|----------|-------------|
+| `general` | Operation mode, library paths, API server |
+| `workers` | CPU/GPU worker configuration |
+| `transcription` | Whisper model and transcription options |
+| `subtitles` | Subtitle naming and formatting |
+| `skip` | Skip conditions for files |
+| `scanner` | Library scanner configuration |
+| `bazarr` | Bazarr provider integration |
+| `advanced` | Advanced options (path mapping, etc.) |
+
+---
+
+## All Settings Reference
+
+### General Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `operation_mode` | string | `standalone` | Operation mode: `standalone`, `provider`, or `standalone,provider` |
+| `library_paths` | list | `""` | Comma-separated library paths to scan |
+| `api_host` | string | `0.0.0.0` | API server host |
+| `api_port` | integer | `8000` | API server port |
+| `debug` | boolean | `false` | Enable debug mode |
+
+### Worker Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `worker_cpu_count` | integer | `0` | Number of CPU workers to start on boot |
+| `worker_gpu_count` | integer | `0` | Number of GPU workers to start on boot |
+| `concurrent_transcriptions` | integer | `2` | Maximum concurrent transcriptions |
+| `worker_healthcheck_interval` | integer | `60` | Worker health check interval (seconds) |
+| `worker_auto_restart` | boolean | `true` | Auto-restart failed workers |
+| `clear_vram_on_complete` | boolean | `true` | Clear VRAM after job completion |
+
+### Transcription Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `whisper_model` | string | `medium` | Whisper model: `tiny`, `base`, `small`, `medium`, `large-v3`, `large-v3-turbo` |
+| `model_path` | string | `./models` | Path to store Whisper models |
+| `transcribe_device` | string | `cpu` | Device: `cpu`, `cuda`, `gpu` |
+| `cpu_compute_type` | string | `auto` | CPU compute type: `auto`, `int8`, `float32` |
+| `gpu_compute_type` | string | `auto` | GPU compute type: `auto`, `float16`, `float32`, `int8_float16`, `int8` |
+| `whisper_threads` | integer | `4` | Number of CPU threads for Whisper |
+| `transcribe_or_translate` | string | `transcribe` | Default mode: `transcribe` or `translate` |
+| `word_level_highlight` | boolean | `false` | Enable word-level highlighting |
+| `detect_language_length` | integer | `30` | Seconds of audio for language detection |
+| `detect_language_offset` | integer | `0` | Offset for language detection sample |
+
+### Whisper Models
+
+| Model | Size | Speed | Quality | VRAM |
+|-------|------|-------|---------|------|
+| `tiny` | 39M | Fastest | Basic | ~1GB |
+| `base` | 74M | Very Fast | Fair | ~1GB |
+| `small` | 244M | Fast | Good | ~2GB |
+| `medium` | 769M | Medium | Great | ~5GB |
+| `large-v3` | 1.5G | Slow | Excellent | ~10GB |
+| `large-v3-turbo` | 809M | Fast | Excellent | ~6GB |
+
+### Subtitle Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `subtitle_language_name` | string | `""` | Custom subtitle language name |
+| `subtitle_language_naming_type` | string | `ISO_639_2_B` | Naming type: `ISO_639_1`, `ISO_639_2_T`, `ISO_639_2_B`, `NAME`, `NATIVE` |
+| `custom_regroup` | string | `cm_sl=84_sl=42++++++1` | Custom regrouping algorithm |
+
+**Language Naming Types:**
+
+| Type | Example (Spanish) |
+|------|-------------------|
+| ISO_639_1 | `es` |
+| ISO_639_2_T | `spa` |
+| ISO_639_2_B | `spa` |
+| NAME | `Spanish` |
+| NATIVE | `Espanol` |
+
+### Skip Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `skip_if_external_subtitles_exist` | boolean | `false` | Skip if any external subtitle exists |
+| `skip_if_target_subtitles_exist` | boolean | `true` | Skip if target language subtitle exists |
+| `skip_if_internal_subtitles_language` | string | `""` | Skip if internal subtitle in this language |
+| `skip_subtitle_languages` | list | `""` | Pipe-separated language codes to skip |
+| `skip_if_audio_languages` | list | `""` | Skip if audio track is in these languages |
+| `skip_unknown_language` | boolean | `false` | Skip files with unknown audio language |
+| `skip_only_subgen_subtitles` | boolean | `false` | Only skip SubGen-generated subtitles |
+
+### Scanner Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `scanner_enabled` | boolean | `true` | Enable library scanner |
+| `scanner_cron` | string | `0 2 * * *` | Cron expression for scheduled scans |
+| `scanner_schedule_interval_minutes` | integer | `360` | Scan interval in minutes (6 hours) |
+| `watcher_enabled` | boolean | `false` | Enable real-time file watcher |
+| `auto_scan_enabled` | boolean | `false` | Enable automatic scheduled scanning |
+
+### Bazarr Provider Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `bazarr_provider_enabled` | boolean | `false` | Enable Bazarr provider mode |
+| `bazarr_url` | string | `http://bazarr:6767` | Bazarr server URL |
+| `bazarr_api_key` | string | `""` | Bazarr API key (auto-generated) |
+| `provider_timeout_seconds` | integer | `600` | Provider request timeout |
+| `provider_callback_enabled` | boolean | `true` | Enable callback on completion |
+| `provider_polling_interval` | integer | `30` | Polling interval for jobs |
+
+### Advanced Settings
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `force_detected_language_to` | string | `""` | Force detected language to specific code |
+| `preferred_audio_languages` | list | `eng` | Pipe-separated preferred audio languages |
+| `use_path_mapping` | boolean | `false` | Enable path mapping for network shares |
+| `path_mapping_from` | string | `/tv` | Path mapping source |
+| `path_mapping_to` | string | `/Volumes/TV` | Path mapping destination |
+| `lrc_for_audio_files` | boolean | `true` | Generate LRC files for audio-only files |
+
+---
+
+## Environment Variables
+
+The **only** environment variable required is `DATABASE_URL` in the `.env` file:
+
+```bash
+# SQLite (default, good for single-user)
+DATABASE_URL=sqlite:///./transcriptarr.db
+
+# PostgreSQL (recommended for production)
+DATABASE_URL=postgresql://user:password@localhost:5432/transcriptarr
+
+# MariaDB/MySQL
+DATABASE_URL=mariadb+pymysql://user:password@localhost:3306/transcriptarr
+```
+
+**All other configuration** is stored in the database and managed through:
+- Setup Wizard (first run)
+- Web UI Settings page
+- Settings API endpoints
+
+This design ensures:
+- No `.env` file bloat
+- Runtime configuration changes without restart
+- Centralized configuration management
+- Easy backup (configuration is in the database)
+
+---
+
+## Setup Wizard
+
+### Standalone Mode
+
+For independent operation with local library scanning.
+
+**Configuration Flow:**
+1. Select library paths (e.g., `/media/anime`, `/media/movies`)
+2. Create initial scan rules (e.g., "Japanese audio → Spanish subtitles")
+3. Configure workers (CPU count, GPU count)
+4. Set scanner interval (default: 6 hours)
+
+**API Endpoint:** `POST /api/setup/standalone`
+
+```json
+{
+  "library_paths": ["/media/anime", "/media/movies"],
+  "scan_rules": [
+    {
+      "name": "Japanese to Spanish",
+      "audio_language_is": "jpn",
+      "missing_external_subtitle_lang": "spa",
+      "target_language": "spa",
+      "action_type": "transcribe"
+    }
+  ],
+  "worker_config": {
+    "count": 1,
+    "type": "cpu"
+  },
+  "scanner_config": {
+    "interval_minutes": 360
+  }
+}
+```
+
+### Bazarr Slave Mode
+
+For integration with Bazarr as a subtitle provider.
+
+**Configuration Flow:**
+1. Select Bazarr mode
+2. System auto-generates API key
+3. Displays connection info for Bazarr configuration
+
+**API Endpoint:** `POST /api/setup/bazarr-slave`
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Bazarr slave mode configured successfully",
+  "bazarr_info": {
+    "mode": "bazarr_slave",
+    "host": "127.0.0.1",
+    "port": 8000,
+    "api_key": "generated_api_key_here",
+    "provider_url": "http://127.0.0.1:8000"
+  }
+}
+```
+
+---
+
+## API Configuration
+
+### Get All Settings
+
+```bash
+curl http://localhost:8000/api/settings
+```
+
+### Get by Category
+
+```bash
+curl "http://localhost:8000/api/settings?category=workers"
+```
+
+### Get Single Setting
+
+```bash
+curl http://localhost:8000/api/settings/worker_cpu_count
+```
+
+### Update Setting
+
+```bash
+curl -X PUT http://localhost:8000/api/settings/worker_cpu_count \
+  -H "Content-Type: application/json" \
+  -d '{"value": "2"}'
+```
+
+### Bulk Update
+
+```bash
+curl -X POST http://localhost:8000/api/settings/bulk-update \
+  -H "Content-Type: application/json" \
+  -d '{
+    "settings": {
+      "worker_cpu_count": "2",
+      "worker_gpu_count": "1",
+      "scanner_enabled": "true"
+    }
+  }'
+```
+
+### Create Custom Setting
+
+```bash
+curl -X POST http://localhost:8000/api/settings \
+  -H "Content-Type: application/json" \
+  -d '{
+    "key": "my_custom_setting",
+    "value": "custom_value",
+    "description": "My custom setting",
+    "category": "advanced",
+    "value_type": "string"
+  }'
+```
+
+### Delete Setting
+
+```bash
+curl -X DELETE http://localhost:8000/api/settings/my_custom_setting
+```
+
+### Initialize Defaults
+
+```bash
+curl -X POST http://localhost:8000/api/settings/init-defaults
+```
+
+---
+
+## Python Usage
+
+```python
+from backend.core.settings_service import settings_service
+
+# Get setting with default
+cpu_count = settings_service.get("worker_cpu_count", default=1)
+
+# Set setting
+settings_service.set("worker_cpu_count", 2)
+
+# Bulk update
+settings_service.bulk_update({
+    "worker_cpu_count": "2",
+    "scanner_enabled": "true"
+})
+
+# Get all settings in category
+worker_settings = settings_service.get_by_category("workers")
+
+# Initialize defaults (safe to call multiple times)
+settings_service.init_default_settings()
+```
--- a/docs/FRONTEND.md
+++ b/docs/FRONTEND.md
@@ -0,0 +1,666 @@
+# TranscriptorIO Frontend
+
+Technical documentation for the Vue 3 frontend application.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Technology Stack](#technology-stack)
+- [Directory Structure](#directory-structure)
+- [Development Setup](#development-setup)
+- [Views](#views)
+- [Components](#components)
+- [State Management](#state-management)
+- [API Service](#api-service)
+- [Routing](#routing)
+- [Styling](#styling)
+- [Build and Deployment](#build-and-deployment)
+
+---
+
+## Overview
+
+The TranscriptorIO frontend is a Single Page Application (SPA) built with Vue 3, featuring:
+
+- **6 Complete Views**: Dashboard, Queue, Scanner, Rules, Workers, Settings
+- **Real-time Updates**: Polling-based status updates
+- **Dark Theme**: Tdarr-inspired dark UI
+- **Type Safety**: Full TypeScript support
+- **State Management**: Pinia stores for shared state
+
+---
+
+## Technology Stack
+
+| Technology | Version | Purpose |
+|------------|---------|---------|
+| Vue.js | 3.4+ | UI Framework |
+| Vue Router | 4.2+ | Client-side routing |
+| Pinia | 2.1+ | State management |
+| Axios | 1.6+ | HTTP client |
+| TypeScript | 5.3+ | Type safety |
+| Vite | 5.0+ | Build tool / dev server |
+
+---
+
+## Directory Structure
+
+```
+frontend/
+├── public/                     # Static assets (favicon, etc.)
+├── src/
+│   ├── main.ts                 # Application entry point
+│   ├── App.vue                 # Root component + navigation
+│   │
+│   ├── views/                  # Page components (routed)
+│   │   ├── DashboardView.vue   # System overview + resources
+│   │   ├── QueueView.vue       # Job management
+│   │   ├── ScannerView.vue     # Scanner control
+│   │   ├── RulesView.vue       # Scan rules CRUD
+│   │   ├── WorkersView.vue     # Worker pool management
+│   │   └── SettingsView.vue    # Settings management
+│   │
+│   ├── components/             # Reusable components
+│   │   ├── ConnectionWarning.vue  # Backend connection status
+│   │   ├── PathBrowser.vue        # Filesystem browser modal
+│   │   └── SetupWizard.vue        # First-run setup wizard
+│   │
+│   ├── stores/                 # Pinia state stores
+│   │   ├── config.ts           # Configuration store
+│   │   ├── system.ts           # System status store
+│   │   ├── workers.ts          # Workers store
+│   │   └── jobs.ts             # Jobs store
+│   │
+│   ├── services/
+│   │   └── api.ts              # Axios API client
+│   │
+│   ├── router/
+│   │   └── index.ts            # Vue Router configuration
+│   │
+│   ├── types/
+│   │   └── api.ts              # TypeScript interfaces
+│   │
+│   └── assets/
+│       └── css/
+│           └── main.css        # Global styles (dark theme)
+│
+├── index.html                  # HTML template
+├── vite.config.ts              # Vite configuration
+├── tsconfig.json               # TypeScript configuration
+└── package.json                # Dependencies
+```
+
+---
+
+## Development Setup
+
+### Prerequisites
+
+- Node.js 18+ and npm
+- Backend server running on port 8000
+
+### Installation
+
+```bash
+cd frontend
+
+# Install dependencies
+npm install
+
+# Start development server (with proxy to backend)
+npm run dev
+```
+
+### Development URLs
+
+| URL | Description |
+|-----|-------------|
+| http://localhost:3000 | Frontend dev server |
+| http://localhost:8000 | Backend API |
+| http://localhost:8000/docs | Swagger API docs |
+
+### Scripts
+
+```bash
+npm run dev      # Start dev server with HMR
+npm run build    # Build for production
+npm run preview  # Preview production build
+npm run lint     # Run ESLint
+```
+
+---
+
+## Views
+
+### DashboardView
+
+**Path**: `/`
+
+System overview with real-time resource monitoring.
+
+**Features**:
+- System status (running/stopped)
+- CPU usage gauge
+- RAM usage gauge
+- GPU usage gauges (per device)
+- Recent jobs list
+- Worker pool summary
+- Scanner status
+
+**Data Sources**:
+- `GET /api/status`
+- `GET /api/system/resources`
+- `GET /api/jobs?page_size=10`
+
+### QueueView
+
+**Path**: `/queue`
+
+Job queue management with filtering and pagination.
+
+**Features**:
+- Job list with status icons
+- Status filter (All/Queued/Processing/Completed/Failed)
+- Pagination controls
+- Retry failed jobs
+- Cancel queued/processing jobs
+- Clear completed jobs
+- Job progress display
+- Processing time display
+
+**Data Sources**:
+- `GET /api/jobs`
+- `GET /api/jobs/stats`
+- `POST /api/jobs/{id}/retry`
+- `DELETE /api/jobs/{id}`
+- `POST /api/jobs/queue/clear`
+
+### ScannerView
+
+**Path**: `/scanner`
+
+Library scanner control and configuration.
+
+**Features**:
+- Scanner status display
+- Start/stop scheduler
+- Start/stop file watcher
+- Manual scan trigger
+- Scan results display
+- Next scan time
+- Total files scanned counter
+
+**Data Sources**:
+- `GET /api/scanner/status`
+- `POST /api/scanner/scan`
+- `POST /api/scanner/scheduler/start`
+- `POST /api/scanner/scheduler/stop`
+- `POST /api/scanner/watcher/start`
+- `POST /api/scanner/watcher/stop`
+
+### RulesView
+
+**Path**: `/rules`
+
+Scan rules CRUD management.
+
+**Features**:
+- Rules list with priority ordering
+- Create new rule (modal)
+- Edit existing rule (modal)
+- Delete rule (with confirmation)
+- Toggle rule enabled/disabled
+- Condition configuration
+- Action configuration
+
+**Data Sources**:
+- `GET /api/scan-rules`
+- `POST /api/scan-rules`
+- `PUT /api/scan-rules/{id}`
+- `DELETE /api/scan-rules/{id}`
+- `POST /api/scan-rules/{id}/toggle`
+
+### WorkersView
+
+**Path**: `/workers`
+
+Worker pool management.
+
+**Features**:
+- Worker list with status
+- Add CPU worker
+- Add GPU worker (with device selection)
+- Remove worker
+- Start/stop pool
+- Worker statistics
+- Current job display per worker
+- Progress and ETA display
+
+**Data Sources**:
+- `GET /api/workers`
+- `GET /api/workers/stats`
+- `POST /api/workers`
+- `DELETE /api/workers/{id}`
+- `POST /api/workers/pool/start`
+- `POST /api/workers/pool/stop`
+
+### SettingsView
+
+**Path**: `/settings`
+
+Database-backed settings management.
+
+**Features**:
+- Settings grouped by category
+- Category tabs (General, Workers, Transcription, Scanner, Bazarr)
+- Edit settings in-place
+- Save changes button
+- Change detection (unsaved changes warning)
+- Setting descriptions
+
+**Data Sources**:
+- `GET /api/settings`
+- `PUT /api/settings/{key}`
+- `POST /api/settings/bulk-update`
+
+---
+
+## Components
+
+### ConnectionWarning
+
+Displays warning banner when backend is unreachable.
+
+**Props**: None
+
+**State**: Uses `systemStore.isConnected`
+
+### PathBrowser
+
+Modal component for browsing filesystem paths.
+
+**Props**:
+- `show: boolean` - Show/hide modal
+- `initialPath: string` - Starting path
+
+**Emits**:
+- `select(path: string)` - Path selected
+- `close()` - Modal closed
+
+**API Calls**:
+- `GET /api/filesystem/browse?path={path}`
+- `GET /api/filesystem/common-paths`
+
+### SetupWizard
+
+First-run setup wizard component.
+
+**Props**: None
+
+**Features**:
+- Mode selection (Standalone/Bazarr)
+- Library path configuration
+- Scan rule creation
+- Worker configuration
+- Scanner interval setting
+
+**API Calls**:
+- `GET /api/setup/status`
+- `POST /api/setup/standalone`
+- `POST /api/setup/bazarr-slave`
+- `POST /api/setup/skip`
+
+---
+
+## State Management
+
+### Pinia Stores
+
+#### systemStore (`stores/system.ts`)
+
+Global system state.
+
+```typescript
+interface SystemState {
+  isConnected: boolean
+  status: SystemStatus | null
+  resources: SystemResources | null
+  loading: boolean
+  error: string | null
+}
+
+// Actions
+fetchStatus()      // Fetch /api/status
+fetchResources()   // Fetch /api/system/resources
+startPolling()     // Start auto-refresh
+stopPolling()      // Stop auto-refresh
+```
+
+#### workersStore (`stores/workers.ts`)
+
+Worker pool state.
+
+```typescript
+interface WorkersState {
+  workers: Worker[]
+  stats: WorkerStats | null
+  loading: boolean
+  error: string | null
+}
+
+// Actions
+fetchWorkers()                      // Fetch all workers
+fetchStats()                        // Fetch pool stats
+addWorker(type, deviceId?)          // Add worker
+removeWorker(id)                    // Remove worker
+startPool(cpuCount, gpuCount)       // Start pool
+stopPool()                          // Stop pool
+```
+
+#### jobsStore (`stores/jobs.ts`)
+
+Job queue state.
+
+```typescript
+interface JobsState {
+  jobs: Job[]
+  stats: QueueStats | null
+  total: number
+  page: number
+  pageSize: number
+  statusFilter: string | null
+  loading: boolean
+  error: string | null
+}
+
+// Actions
+fetchJobs()                 // Fetch with current filters
+fetchStats()                // Fetch queue stats
+retryJob(id)                // Retry failed job
+cancelJob(id)               // Cancel job
+clearCompleted()            // Clear completed jobs
+setStatusFilter(status)     // Update filter
+setPage(page)               // Change page
+```
+
+#### configStore (`stores/config.ts`)
+
+Settings configuration state.
+
+```typescript
+interface ConfigState {
+  settings: Setting[]
+  loading: boolean
+  error: string | null
+  pendingChanges: Record<string, string>
+}
+
+// Actions
+fetchSettings(category?)    // Fetch settings
+updateSetting(key, value)   // Queue update
+saveChanges()               // Save all pending
+discardChanges()            // Discard pending
+```
+
+---
+
+## API Service
+
+### Configuration (`services/api.ts`)
+
+```typescript
+import axios from 'axios'
+
+const api = axios.create({
+  baseURL: '/api',
+  timeout: 30000,
+  headers: {
+    'Content-Type': 'application/json'
+  }
+})
+
+// Response interceptor for error handling
+api.interceptors.response.use(
+  response => response,
+  error => {
+    console.error('API Error:', error)
+    return Promise.reject(error)
+  }
+)
+
+export default api
+```
+
+### Usage Example
+
+```typescript
+import api from '@/services/api'
+
+// GET request
+const response = await api.get('/jobs', {
+  params: { status_filter: 'queued', page: 1 }
+})
+
+// POST request
+const job = await api.post('/jobs', {
+  file_path: '/media/video.mkv',
+  target_lang: 'spa'
+})
+
+// PUT request
+await api.put('/settings/worker_cpu_count', {
+  value: '2'
+})
+
+// DELETE request
+await api.delete(`/jobs/${jobId}`)
+```
+
+---
+
+## Routing
+
+### Route Configuration
+
+```typescript
+const routes = [
+  { path: '/', name: 'Dashboard', component: DashboardView },
+  { path: '/workers', name: 'Workers', component: WorkersView },
+  { path: '/queue', name: 'Queue', component: QueueView },
+  { path: '/scanner', name: 'Scanner', component: ScannerView },
+  { path: '/rules', name: 'Rules', component: RulesView },
+  { path: '/settings', name: 'Settings', component: SettingsView }
+]
+```
+
+### Navigation
+
+Navigation is handled in `App.vue` with a sidebar menu.
+
+```vue
+<nav class="sidebar">
+  <router-link to="/">Dashboard</router-link>
+  <router-link to="/workers">Workers</router-link>
+  <router-link to="/queue">Queue</router-link>
+  <router-link to="/scanner">Scanner</router-link>
+  <router-link to="/rules">Rules</router-link>
+  <router-link to="/settings">Settings</router-link>
+</nav>
+
+<main class="content">
+  <router-view />
+</main>
+```
+
+---
+
+## Styling
+
+### Dark Theme
+
+The application uses a Tdarr-inspired dark theme defined in `assets/css/main.css`.
+
+**Color Palette**:
+
+| Variable | Value | Usage |
+|----------|-------|-------|
+| --bg-primary | #1a1a2e | Main background |
+| --bg-secondary | #16213e | Card background |
+| --bg-tertiary | #0f3460 | Hover states |
+| --text-primary | #eaeaea | Primary text |
+| --text-secondary | #a0a0a0 | Secondary text |
+| --accent-primary | #e94560 | Buttons, links |
+| --accent-success | #4ade80 | Success states |
+| --accent-warning | #fbbf24 | Warning states |
+| --accent-error | #ef4444 | Error states |
+
+### Component Styling
+
+Components use scoped CSS with CSS variables:
+
+```vue
+<style scoped>
+.card {
+  background: var(--bg-secondary);
+  border-radius: 8px;
+  padding: 1.5rem;
+}
+
+.btn-primary {
+  background: var(--accent-primary);
+  color: white;
+  border: none;
+  padding: 0.5rem 1rem;
+  border-radius: 4px;
+  cursor: pointer;
+}
+
+.btn-primary:hover {
+  opacity: 0.9;
+}
+</style>
+```
+
+---
+
+## Build and Deployment
+
+### Production Build
+
+```bash
+cd frontend
+npm run build
+```
+
+This creates a `dist/` folder with:
+- `index.html` - Entry HTML
+- `assets/` - JS, CSS bundles (hashed filenames)
+
+### Deployment Options
+
+#### Option 1: Served by Backend (Recommended)
+
+The FastAPI backend automatically serves the frontend from `frontend/dist/`:
+
+```python
+# backend/app.py
+frontend_path = Path(__file__).parent.parent / "frontend" / "dist"
+
+if frontend_path.exists():
+    app.mount("/assets", StaticFiles(directory=str(frontend_path / "assets")))
+
+    @app.get("/{full_path:path}")
+    async def serve_frontend(full_path: str = ""):
+        return FileResponse(str(frontend_path / "index.html"))
+```
+
+**Access**: http://localhost:8000
+
+#### Option 2: Nginx Reverse Proxy
+
+```nginx
+server {
+    listen 80;
+    server_name transcriptorio.local;
+
+    # Frontend
+    location / {
+        root /var/www/transcriptorio/frontend/dist;
+        try_files $uri $uri/ /index.html;
+    }
+
+    # Backend API
+    location /api {
+        proxy_pass http://localhost:8000;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+    }
+}
+```
+
+#### Option 3: Docker
+
+```dockerfile
+# Build frontend
+FROM node:18-alpine AS frontend-builder
+WORKDIR /app/frontend
+COPY frontend/package*.json ./
+RUN npm ci
+COPY frontend/ ./
+RUN npm run build
+
+# Final image
+FROM python:3.12-slim
+COPY --from=frontend-builder /app/frontend/dist /app/frontend/dist
+# ... rest of backend setup
+```
+
+---
+
+## TypeScript Interfaces
+
+### Key Types (`types/api.ts`)
+
+```typescript
+// Job
+interface Job {
+  id: string
+  file_path: string
+  file_name: string
+  status: 'queued' | 'processing' | 'completed' | 'failed' | 'cancelled'
+  priority: number
+  progress: number
+  // ... more fields
+}
+
+// Worker
+interface Worker {
+  worker_id: string
+  worker_type: 'cpu' | 'gpu'
+  device_id: number | null
+  status: 'idle' | 'busy' | 'stopped' | 'error'
+  current_job_id: string | null
+  jobs_completed: number
+  jobs_failed: number
+}
+
+// Setting
+interface Setting {
+  id: number
+  key: string
+  value: string | null
+  description: string | null
+  category: string | null
+  value_type: string | null
+}
+
+// ScanRule
+interface ScanRule {
+  id: number
+  name: string
+  enabled: boolean
+  priority: number
+  conditions: ScanRuleConditions
+  action: ScanRuleAction
+}
+```