Audio Caching System¶
The audio‑caching subsystem automatically converts large lossless audio files (FLAC, WAV, AIFF, …) into smaller MP3 streams, dramatically reducing bandwidth while preserving a pleasant listening experience.
TL;DR – The cache turns a 40 MB FLAC track into a ~5 MB MP3 (≈ 87 % bandwidth saving) and serves the MP3 via HTTP range requests.
📖 Overview¶
When streaming lossless audio over the web, bandwidth quickly becomes a bottleneck:
| Format | Approx. size (4‑min track) | Typical bitrate |
|---|---|---|
| FLAC (original) | 40‑50 MB | ~1 000 kbps |
| MP3 – High (256 kbps) | ≈ 8 MB | 256 kbps |
| MP3 – Medium (192 kbps) | ≈ 5 MB | 192 kbps |
| MP3 – Low (128 kbps) | ≈ 3 MB | 128 kbps |
Result: A 4‑minute track drops from ~45 MB to ~5 MB (≈ 87 % bandwidth reduction) with negligible audible loss for casual listening.
The cache is transparent to the rest of the application:
- The UI requests a track → the Flask route asks
AudioCachefor a cached version. - If a suitable MP3 exists, it is streamed via HTTP range requests.
- If not, the original file is streamed (or a background job creates the cache for the next request).
🏛️ Architecture Overview¶
graph TD
A["AudioCache (core)"] --> B["Cache Path Generation"]
A --> C["Transcoding (ffmpeg)"]
A --> D["Cache Management (size, cleanup)"]
E[CacheWorker] --> A
E --> F["ThreadPool (parallel batch)"]
E --> G["ProgressTracker (SSE)"]
H[ProgressTracker] --> I["Frontend (EventSource)"]
audio_cache.py– core logic (hash‑based filenames, transcoding, cache look‑ups).cache_worker.py– batch processing, thread‑pool parallelism, progress callbacks.progress_tracker.py– Server‑Sent Events (SSE) emitter that feeds the UI’s “caching progress” modal.
✨ Key Features¶
| Feature | Description |
|---|---|
| Automatic transcoding | FLAC, WAV, AIFF, APE, ALAC → MP3 (high/medium/low). |
| Multiple quality levels | high (256 kbps), medium (192 kbps), low (128 kbps). |
| Smart caching | Only creates a cached file when the source is lossless and the cache is missing/out-of-date. |
| Pre-caching on upload | When a mixtape is saved, the system can generate caches automatically. |
| Parallel batch processing | Thread-pool (configurable workers) for fast bulk transcoding. |
| Progress tracking | Real-time SSE updates displayed in a Bootstrap modal. |
| Cache management utilities | Size calculation, age-based cleanup, full purge. |
| Config-driven | All knobs live in src/config/config.py (AUDIO_CACHE_*). |
📋 How It Works (Step‑by‑Step)¶
Cache Path Generation¶
flowchart LR
A[Original file path] --> B[Normalize & resolve]
B --> C[MD5 hash of full path]
C --> D[Compose filename: `<hash>_<quality>_<bitrate>.mp3`]
D --> E["Cache directory (`AUDIO_CACHE_DIR`)"]
- The hash guarantees collision‑free filenames, even for identically named tracks in different folders.
- Example:
Original:
/music/Radiohead/OK Computer/01 Airbag.flacHash:a1b2c3…→ Cache filea1b2c3_medium_192k.mp3.
Transcoding Flow¶
sequenceDiagram
participant UI
participant Flask
participant CacheWorker
participant AudioCache
participant ffmpeg
UI->>Flask: Request play (quality=medium)
Flask->>AudioCache: get_cached_or_original()
alt Cached version exists
AudioCache-->>Flask: Return cached path
else No cache
AudioCache->>CacheWorker: transcode_file()
CacheWorker->>ffmpeg: Run ffmpeg command
ffmpeg-->>CacheWorker: MP3 file created
CacheWorker->>AudioCache: Store in cache dir
AudioCache-->>Flask: Return newly cached path
end
Flask->>UI: Stream MP3
- If a cached file is present, it is served immediately.
- Otherwise the worker spawns ffmpeg, writes the MP3, and returns the new path.
Playback Flow¶
graph LR
A[User clicks Play] --> B{Quality selected?}
B -->|Original| C[Serve original FLAC]
B -->|High/Med/Low| D{Is source lossless?}
D -->|No| C
D -->|Yes| E{Cache exists?}
E -->|Yes| F[Serve cached MP3]
E -->|No| G[Log warning → fall back to original]
F --> H[User streams small file]
C --> I[User streams large file]
🔌 API Reference¶
AudioCache (core)¶
AudioCache(cache_dir, logger=None)
¶
Manages audio file transcoding and caching for bandwidth optimization.
Provides methods to check for cached versions, generate transcoded files, and manage the cache directory.
Initialize the AudioCache manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_dir
|
Path
|
Directory where cached transcoded files will be stored. |
required |
logger
|
Logger | None
|
Optional logger for tracking operations. |
None
|
Methods:
| Name | Description |
|---|---|
get_cache_path |
Generate a cache filename based on the original path and quality level. |
should_transcode |
Determine if a file should be transcoded based on its format. |
is_cached |
Check if a cached version exists and is up-to-date. |
transcode_file |
Transcode an audio file to a cached version. |
get_cached_or_original |
Get the cached version if available, otherwise return original path. |
precache_file |
Pre-generate cached versions at multiple quality levels. |
get_cache_size |
Calculate total size of the cache directory in bytes. |
clear_cache |
Clear cached files, optionally only those older than specified days. |
Source code in src/audio_cache/audio_cache.py
get_cache_path(original_path, quality='medium')
¶
Generate a cache filename based on the original path and quality level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_path
|
Path
|
Path to the original audio file. |
required |
quality
|
QualityLevel
|
Quality level for transcoding (high, medium, low). |
'medium'
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the cached file location. |
Source code in src/audio_cache/audio_cache.py
should_transcode(file_path)
¶
Determine if a file should be transcoded based on its format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to the audio file. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the file should be transcoded, False otherwise. |
Source code in src/audio_cache/audio_cache.py
is_cached(original_path, quality='medium')
¶
Check if a cached version exists and is up-to-date.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_path
|
Path
|
Path to the original audio file. |
required |
quality
|
QualityLevel
|
Quality level to check. |
'medium'
|
Returns:
| Type | Description |
|---|---|
bool
|
True if a valid cached version exists, False otherwise. |
Source code in src/audio_cache/audio_cache.py
transcode_file(original_path, quality='medium', overwrite=False)
¶
Transcode an audio file to a cached version.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_path
|
Path
|
Path to the original audio file. |
required |
quality
|
QualityLevel
|
Quality level for transcoding. |
'medium'
|
overwrite
|
bool
|
If True, regenerate cache even if it exists. |
False
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the transcoded file (or original if no transcoding needed). |
Raises:
| Type | Description |
|---|---|
CalledProcessError
|
If ffmpeg transcoding fails. |
FileNotFoundError
|
If the original file doesn't exist. |
Source code in src/audio_cache/audio_cache.py
get_cached_or_original(original_path, quality='medium')
¶
Get the cached version if available, otherwise return original path.
This method does NOT generate a cache if it doesn't exist. Use transcode_file() for that purpose.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_path
|
Path
|
Path to the original audio file. |
required |
quality
|
QualityLevel
|
Quality level to retrieve. |
'medium'
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to cached version if available, otherwise original path. |
Source code in src/audio_cache/audio_cache.py
precache_file(original_path, qualities=None)
¶
Pre-generate cached versions at multiple quality levels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
original_path
|
Path
|
Path to the original audio file. |
required |
qualities
|
list[QualityLevel]
|
List of quality levels to generate. Defaults to ["medium"]. |
None
|
Returns:
| Type | Description |
|---|---|
dict[QualityLevel, Path]
|
Dictionary mapping quality levels to their cached paths. |
Source code in src/audio_cache/audio_cache.py
get_cache_size()
¶
Calculate total size of the cache directory in bytes.
Returns:
| Type | Description |
|---|---|
int
|
Total size in bytes. |
Source code in src/audio_cache/audio_cache.py
clear_cache(older_than_days=None)
¶
Clear cached files, optionally only those older than specified days.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
older_than_days
|
int | None
|
If specified, only delete files older than this many days. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Number of files deleted. |
Source code in src/audio_cache/audio_cache.py
CacheWorker (batch & async)¶
CacheWorker(audio_cache, logger=None, max_workers=4)
¶
Worker for pre-caching audio files in the background.
Provides methods to cache individual files or entire mixtapes at specified quality levels using thread pools for parallel processing.
Initialize the cache worker.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_cache
|
AudioCache
|
AudioCache instance for transcoding operations. |
required |
logger
|
Logger | None
|
Optional logger for tracking operations. |
None
|
max_workers
|
int
|
Maximum number of parallel transcoding threads. |
4
|
Methods:
| Name | Description |
|---|---|
cache_single_file |
Cache a single audio file at specified quality levels. |
cache_mixtape |
Cache all audio files in a mixtape. |
cache_mixtape_async |
Cache all audio files in a mixtape using parallel processing. |
verify_mixtape_cache |
Verify which tracks in a mixtape have valid cached versions. |
regenerate_outdated_cache |
Regenerate cached versions that are older than their source files. |
Source code in src/audio_cache/cache_worker.py
cache_single_file(file_path, qualities=None)
¶
Cache a single audio file at specified quality levels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to the audio file. |
required |
qualities
|
list[QualityLevel]
|
List of quality levels to cache. Defaults to ["medium"]. |
None
|
Returns:
| Type | Description |
|---|---|
dict[QualityLevel, bool]
|
Dictionary mapping quality levels to success status. |
Source code in src/audio_cache/cache_worker.py
cache_mixtape(track_paths, qualities=None, progress_callback=None)
¶
Cache all audio files in a mixtape.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
track_paths
|
list[Path]
|
List of paths to audio files in the mixtape. |
required |
qualities
|
list[QualityLevel]
|
Quality levels to cache. Defaults to ["medium"]. |
None
|
progress_callback
|
Callable[[int, int], None] | None
|
Optional callback function(current, total) for progress updates. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, dict]
|
Dictionary with results for each file. |
Source code in src/audio_cache/cache_worker.py
cache_mixtape_async(track_paths, qualities=None, progress_callback=None)
¶
Cache all audio files in a mixtape using parallel processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
track_paths
|
list[Path]
|
List of paths to audio files in the mixtape. |
required |
qualities
|
list[QualityLevel]
|
Quality levels to cache. Defaults to ["medium"]. |
None
|
progress_callback
|
Callable[[int, int], None] | None
|
Optional callback function(current, total) for progress updates. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, dict]
|
Dictionary with results for each file. |
Source code in src/audio_cache/cache_worker.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | |
verify_mixtape_cache(track_paths, quality='medium')
¶
Verify which tracks in a mixtape have valid cached versions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
track_paths
|
list[Path]
|
List of paths to audio files in the mixtape. |
required |
quality
|
QualityLevel
|
Quality level to check. |
'medium'
|
Returns:
| Type | Description |
|---|---|
dict[str, bool]
|
Dictionary mapping file paths to cache availability status. |
Source code in src/audio_cache/cache_worker.py
regenerate_outdated_cache(track_paths, qualities=None)
¶
Regenerate cached versions that are older than their source files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
track_paths
|
list[Path]
|
List of paths to audio files. |
required |
qualities
|
list[QualityLevel]
|
Quality levels to regenerate. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, dict]
|
Dictionary with regeneration results for each file. |
Source code in src/audio_cache/cache_worker.py
Convenience Scheduler¶
schedule_mixtape_caching(mixtape_tracks, music_root, audio_cache, logger=None, qualities=None, async_mode=True, progress_callback=None)
¶
Convenience function to schedule caching for a mixtape's tracks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixtape_tracks
|
list[dict]
|
List of track dictionaries with 'path' keys. |
required |
music_root
|
Path
|
Root directory for music files. |
required |
audio_cache
|
AudioCache
|
AudioCache instance. |
required |
logger
|
Logger | None
|
Optional logger. |
None
|
qualities
|
list[QualityLevel]
|
Quality levels to cache. Defaults to ["medium"]. |
None
|
async_mode
|
bool
|
If True, use parallel processing. |
True
|
progress_callback
|
Callable[[int, int], None] | None
|
Optional callback function(current, total) for progress updates. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with caching results. |
Source code in src/audio_cache/cache_worker.py
Progress Tracker (SSE)¶
get_progress_tracker(logger=None)
¶
Get or create the global progress tracker instance.
Source code in src/audio_cache/progress_tracker.py
ProgressTracker(logger=None)
¶
Tracks progress of long-running operations and broadcasts updates via SSE.
Thread-safe implementation that allows multiple operations to report progress while clients listen for updates.
Initializes a new progress tracker with optional logging.
Sets up internal, thread-safe queues for tracking task-specific progress events that can be streamed to clients.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
logger
|
Logger | None
|
Optional logger instance used to record progress tracker activity. |
None
|
Methods:
| Name | Description |
|---|---|
create_task |
Create a new task for tracking. |
emit |
Emit a progress event. |
listen |
Generator that yields SSE-formatted progress events. |
cleanup_task |
Remove a task and its queue. |
Source code in src/audio_cache/progress_tracker.py
create_task(task_id)
¶
Create a new task for tracking.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
Unique identifier for this task (e.g., mixtape slug) |
required |
Source code in src/audio_cache/progress_tracker.py
emit(task_id, step, status, message, current=0, total=0)
¶
Emit a progress event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
Task identifier |
required |
step
|
str
|
Name of the current step (e.g., "saving", "caching_track") |
required |
status
|
ProgressStatus
|
Current status |
required |
message
|
str
|
Human-readable message |
required |
current
|
int
|
Current progress count |
0
|
total
|
int
|
Total items to process |
0
|
Source code in src/audio_cache/progress_tracker.py
listen(task_id, timeout=300)
¶
Generator that yields SSE-formatted progress events.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
Task identifier to listen to |
required |
timeout
|
int
|
Maximum time to wait for events (seconds) |
300
|
Yields:
| Name | Type | Description |
|---|---|---|
str |
SSE-formatted event strings |
Source code in src/audio_cache/progress_tracker.py
cleanup_task(task_id)
¶
Remove a task and its queue.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
Task identifier to clean up |
required |
Source code in src/audio_cache/progress_tracker.py
ProgressCallback(task_id, tracker, total_tracks)
¶
Callback wrapper for audio caching progress.
Translates cache worker progress updates into SSE events.
Initialize the progress callback.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_id
|
str
|
Task identifier |
required |
tracker
|
ProgressTracker
|
ProgressTracker instance |
required |
total_tracks
|
int
|
Total number of tracks to cache |
required |
Methods:
| Name | Description |
|---|---|
__call__ |
Called by cache worker with progress updates. |
track_cached |
Records that a track has been successfully cached. |
track_skipped |
Records that a track was intentionally skipped during caching. |
track_failed |
Records that caching a track has failed. |
Source code in src/audio_cache/progress_tracker.py
__call__(current, total)
¶
Called by cache worker with progress updates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
current
|
int
|
Current file number |
required |
total
|
int
|
Total files to process |
required |
Source code in src/audio_cache/progress_tracker.py
track_cached(track_name)
¶
Records that a track has been successfully cached.
Increments the count of cached tracks and emits a progress event reflecting the updated completion state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
track_name
|
str
|
The display name or identifier of the cached track. |
required |
Source code in src/audio_cache/progress_tracker.py
track_skipped(track_name, reason='already cached')
¶
Records that a track was intentionally skipped during caching.
Increments the skipped count and emits a progress event explaining why the track was not processed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
track_name
|
str
|
The display name or identifier of the skipped track. |
required |
reason
|
str
|
Human-readable explanation for why the track was skipped. |
'already cached'
|
Source code in src/audio_cache/progress_tracker.py
track_failed(track_name, error)
¶
Records that caching a track has failed.
Increments the failed count and emits a progress event describing the error that occurred.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
track_name
|
str
|
The display name or identifier of the track that failed to cache. |
required |
error
|
str
|
Human-readable error description explaining the failure. |
required |
Source code in src/audio_cache/progress_tracker.py
🛠️ Configuration Options¶
| Option | Default | Description |
|---|---|---|
AUDIO_CACHE_DIR |
"cache/audio" |
Directory where MP3 caches are stored (relative to DATA_ROOT). |
AUDIO_CACHE_ENABLED |
True |
Master switch – set to False to bypass the entire subsystem. |
AUDIO_CACHE_DEFAULT_QUALITY |
"medium" |
Quality used when a client does not specify one. |
AUDIO_CACHE_MAX_WORKERS |
4 |
Number of parallel threads for batch transcoding. |
AUDIO_CACHE_PRECACHE_ON_UPLOAD |
True |
Auto-cache mixtape tracks when a mixtape is saved. |
AUDIO_CACHE_PRECACHE_QUALITIES |
["medium"] |
List of qualities to pre-generate (e.g., ["low", "medium", "high"]). |
These values are defined in
src/config/config.pyand can be overridden with environment variables (e.g.,AUDIO_CACHE_MAX_WORKERS=8).
⏳ Progress Tracking (SSE)¶
The progress modal in the editor UI subscribes to the endpoint:
The server returns a Server‑Sent Events stream. Each event looks like:
{
"task_id": "summer-vibes",
"step": "caching",
"status": "in_progress",
"message": "Caching track 3 of 15",
"current": 3,
"total": 15,
"timestamp": "2024-09-28T12:34:56.789012"
}
The modal updates the progress bar, logs messages, and shows a final summary when the status becomes completed or failed.
Implementation note:
ProgressCallback.track_cached(),track_skipped(), andtrack_failed()are called fromCacheWorkerto emit the above events.
🔧 Troubleshooting FAQ¶
Cache Misses – “Why isn’t my file being cached?”¶
| Symptom | Check | Fix |
|---|---|---|
| Cache miss warning in logs | grep -i "cache miss" app.log |
Verify AUDIO_CACHE_ENABLED=True and that the file’s suffix is in should_transcode (FLAC, WAV, AIFF, APE, ALAC). |
| Cache file exists but not found | ls collection-data/cache/audio/ |
Ensure the hash matches the current absolute path. If you moved the music folder, run python debug_cache.py <MUSIC_ROOT> <REL_PATH> <CACHE_DIR> (see debug_cache.py). |
| Cache never generated | AUDIO_CACHE_PRECACHE_ON_UPLOAD=False |
Enable pre-caching or trigger it manually via schedule_mixtape_caching. |
| ffmpeg not found | ffmpeg -version |
Install ffmpeg on the host (Ubuntu: apt install ffmpeg; Alpine: apk add ffmpeg). |
| Permission denied on cache dir | ls -ld collection-data/cache/audio |
The Flask process must have write permission (owner UID = the container user). |
| High CPU usage during batch caching | top while caching |
Reduce AUDIO_CACHE_MAX_WORKERS (e.g., export AUDIO_CACHE_MAX_WORKERS=2). |
| Stale cache after source file change | Compare timestamps (stat -c %Y file) |
Run cache.clear_cache() or set overwrite=True in transcode_file. |
Transcoding Failures – “ffmpeg exited with error code 1”"¶
- Inspect the ffmpeg stderr – it is logged by
AudioCache.transcode_file. - Common culprits:
- Corrupt source file – try re‑encoding the source with
ffmpeg -i inut.flac -c copy output.flac. - Unsupported codec – ensure the source is a supported lossless format.
- Insufficient disk space – check free space on the cache volume.
-
Manual test:
If this works, the problem is likely in the path handling (hash mismatch).
-
Fix path mismatches – run debug_cache.py (see the script in the repo) to compare the hash generated by the app vs. the one you expect.
