Musiclib - Music Collection handling¶
📘 Overview and introduction¶
The musiclib package is the heart of the mixtape music‑collection service. It turns a plain directory tree of audio files into a searchable, fully‑indexed library that can be queried instantly from the UI.
Below is a concise, high‑level walkthrough of the module’s responsibilities, its main components, and how they interact to deliver a robust “scan‑once‑search‑forever” experience.
Class diagram¶
classDiagram
%% ==== Core data types ====
class IndexEvent {
<<dataclass>>
+EventType type
+Path? path
}
class EventType {
<<enumeration>>
INDEX_FILE
DELETE_FILE
CLEAR_DB
REBUILD_DONE
RESYNC_DONE
}
%% ==== CollectionExtractor (main engine) ====
class CollectionExtractor {
-Path music_root
-Path db_path
-Path data_root
-Logger _logger
-Queue[IndexEvent] _write_queue
-Event _writer_stop
-Observer? _observer
-Thread _writer_thread
-set SUPPORTED_EXTS
+CollectionExtractor(music_root, db_path, logger=None)
+rebuild()
+resync()
+start_monitoring()
+stop()
+get_conn(readonly=False) Connection
-_init_db()
-_populate_fts_if_needed()
-_db_writer_loop()
-_index_file(conn, path)
-_write_queue.put(event)
}
%% ==== EnhancedWatcher (filesystem event handler) ====
class EnhancedWatcher {
-CollectionExtractor extractor
+EnhancedWatcher(extractor)
+on_any_event(event)
+shutdown()
}
%% ==== Indexing status helpers (module-level functions) ====
class indexing_status {
<<module>>
+set_indexing_status(data_root, status, total, current)
+clear_indexing_status(data_root)
+get_indexing_status(data_root, logger=None)
-_atomic_write_json(status_file, data)
-_calculate_progress(total, current) float
-_get_started_at(status_file) str|None
-_build_status_data(status, started_at, total, current, progress) dict
}
%% ==== Relationships ====
CollectionExtractor --> IndexEvent : produces / consumes
CollectionExtractor --> EventType : uses literals
CollectionExtractor --> EnhancedWatcher : creates & registers
CollectionExtractor --> indexing_status : writes progress JSON
EnhancedWatcher --> IndexEvent : enqueues events
EnhancedWatcher --> CollectionExtractor : holds reference
indexing_status ..> Path : works with filesystem paths
🧩 What the module does¶
| Goal | How it’s achieved |
|---|---|
| Detect every supported audio file | A watchdog observer (implemented as EnhancedWatcher) monitors the music_root directory in real time. It includes a 2 second debounce to coalesce rapid edits and avoid duplicate indexing. |
| Extract reliable metadata | tinytag.TinyTag reads ID3/metadata tags (artist, album, title, year, duration, etc.). |
| Persist metadata efficiently | A SQLite database stores the canonical rows (tracks table) and an FTS5 virtual table (tracks_fts) that mirrors the same columns for lightning‑fast full‑text search. |
| Keep the DB in sync | A single writer thread serialises all write operations (adds, deletes, clears) via a thread‑safe Queue[IndexEvent]. |
| Expose progress to the UI | A tiny JSON file (indexing_status.json) is updated atomically during long‑running operations (rebuild, resync) so the front‑end can render progress bars. |
| Provide a clean API for the UI | MusicCollection (in reader.py) builds the search expression, runs the query, groups results by release directory, and returns a ready‑to‑render structure (artists, albums, tracks) together with the list of terms that need highlighting. |
🧱 Core building blocks¶
| Module / Class | Primary responsibility |
|---|---|
_extractor.py |
• Low‑level DB schema creation (_init_db). • Full‑text table bootstrap ( _populate_fts_if_needed). • CollectionExtractor – orchestrates indexing, resync, rebuild, and live monitoring. • IndexEvent / EventType – typed messages that drive the writer thread. • _Watcher – translates filesystem events into IndexEvents. |
indexing_status.py |
Helper functions that write/read the indexing_status.json file in an atomic, crash‑safe way (e.g., set_indexing_status, clear_indexing_status, get_indexing_status). |
reader.py |
High‑level façade (MusicCollection) used by the UI. It parses user queries, builds the FTS/LIKE expression, runs the query, groups rows, and formats the result payload (artists, albums, tracks, and highlight terms). |
ui.py |
Extends MusicCollection with UI‑specific helpers: • _highlight_text (term highlighting) • _safe_filename (sanitising filenames) • _escape_for_query (building click‑query strings) • result shaping for the front‑end. |
🔀 Data flow – from file system to UI¶
- Startup –
MusicCollectioncreates aCollectionExtractor. The extractor initializes the SQLite schema and launches the writer thread. - Initial population – If the DB is empty,
MusicCollectionschedules a rebuild. The rebuild walks the entiremusic_root, enqueues anINDEX_FILEevent for every supported file, and updatesindexing_status.jsonso the UI can show progress. - Live updates – The
watchdogobserver fires on every create/modify/delete._Watcherconverts those intoIndexEvents, which the writer thread processes in order, keeping the DB and the FTS mirror perfectly aligned. - Search – When the UI calls
search_highlighting,MusicCollectionparses the query, builds an FTS‑compatible expression (or a fallbackLIKEquery), runs it against the DB, groups rows by release directory, and returns a dictionary of artists, albums, and tracks plus the list of parsed terms. Internally, searches use a multi-pass candidate scoring model with optional reuse of previous search sessions for fast refinements. - Presentation –
MusicCollectionUIhighlights the terms, builds click‑queries (artist:…,release_dir:…), and hands the ready‑to‑render JSON back to the front‑end. Lazy‑loading of an artist’s full discography or an album’s track list is done by re‑issuingsearch_groupedwith the stored click‑query.
💡 Why the design choices matter¶
| Design decision | Benefit |
|---|---|
| Single writer thread + queue | Guarantees deterministic ordering of DB writes, avoids SQLite lock contention, and lets the UI stay responsive while heavy indexing runs in the background. |
| FTS5 virtual table with triggers | Provides sub‑millisecond full‑text look‑ups without having to maintain a separate index manually. |
| Atomic JSON status file | Prevents corrupted progress information even if the process crashes mid‑write; the UI never sees a half‑written file. |
| Watchdog‑driven live sync | Users see newly added songs appear instantly; deletions are reflected without a full rescan. |
Separation of concerns (_extractor vs. reader vs. ui) |
Keeps low‑level DB handling isolated from query parsing and UI formatting, making the code easier to test and extend. |
Typed IndexEvent dataclass |
Improves readability, reduces bugs caused by mismatched queue payloads, and makes future event types straightforward to add. |
Debouncing in the watcher (EnhancedWatcher) |
Prevents a flood of INDEX_FILE events when a user edits a file repeatedly (e.g., retagging). Guarantees only the final state is indexed, reducing DB churn and corruption risk. |
🧠 Quick mental model¶
flowchart LR
%% Nodes
FS[Filesystem audio files]
WD[EnhancedWatcher Observer]
Q[IndexEvent Queue]
WT[Writer Thread processes events]
DB[SQLite DB *tracks + tracks_fts*]
UI[MusicCollection UI layer]
%% Main data flow
FS --> WD
WD --> Q
Q --> WT
WT --> DB
%% Queries from UI
UI --> DB
DB --> UI
The arrow direction indicates the primary flow of data.
The UI never talks directly to the filesystem; it always goes through MusicCollection, which in turn reads from the already‑indexed SQLite store.
🚀 Getting started (for developers)¶
-
Instantiate the high‑level class:
-
Run a query (the UI does this internally):
-
Monitor progress (useful for CLI tools):
-
Shut down cleanly when the program exits:
🧭 Where to look next¶
- DB Loading for:
- low‑level DB schema, triggers, the writer‑loop logic and
- the atomic JSON status handling used by the UI progress bar.
- DB reading for:
- the query parser,
- grouping algorithm that decides which artists/albums/tracks to return,
- presentation helpers (highlighting, safe filenames, click‑query generation) and
- Real‑time monitoring of changes in your music collection
That’s the complete picture of the musiclib module: a tightly coupled pipeline that turns a folder of audio files into a fast, searchable, and continuously synchronized music library.
