diff --git a/docs/superpowers/specs/2026-06-08-smarter-lidarr-matching-design.md b/docs/superpowers/specs/2026-06-08-smarter-lidarr-matching-design.md new file mode 100644 index 0000000..ec409b5 --- /dev/null +++ b/docs/superpowers/specs/2026-06-08-smarter-lidarr-matching-design.md @@ -0,0 +1,121 @@ +# Smarter Lidarr Matching — Design + +**Date:** 2026-06-08 +**Status:** Approved + +## Context & Goal + +Live testing of the REST API exposed a real weakness: `musicfetch`'s +`lidarr_search` trusts Lidarr's universal `/api/v1/search` ordering, which is +fuzzy and unranked. A query of `Daft Punk - Discovery` ranked a novelty remix +("Daft Punk's Discovery but it's in the SM64 Soundfont" by *Pignickel*) #1, and +the real *Discovery* by Daft Punk wasn't even top-5. The interactive CLI picker +lets a human work around this; the **API's noninteractive top-pick cannot** and +grabs garbage. + +The real input shape is Shazam-style `Artist - Track`. Lidarr only grabs +**albums**, never single tracks, so we must resolve a track to the album that +contains it, then pick the best-matching Lidarr album. + +**Goal:** make `lidarr_search` return a **scored, best-first** list of Lidarr +hits so the noninteractive API picks the correct album, and the CLI picker shows +good matches first. Resolve `Artist - Track` → album via MusicBrainz. + +## Decisions (confirmed with user) + +- **Fix in the shared `musicfetch.lidarr_search`** (not an API-only layer) — both + the CLI picker and the API noninteractive pick benefit; no duplicated logic. + Signature unchanged: `lidarr_search(query, limit) -> list[Hit]` (drop-in). +- **Resolve track → album via MusicBrainz** (the same upstream Lidarr uses). + Lidarr's own track indexing is too weak. One extra HTTP call, no API key. +- **Track-first semantics** (`Artist - Track`): the right side is treated as a + track to resolve to its album. (YouTube path already handles exact tracks; this + makes Lidarr the accurate album/discography source.) +- **Scoring** with stdlib `difflib` (no new dependency). +- **YouTube fallback preserved** exactly as today (see below). + +## Architecture + +All changes live in the `musicfetch` binary (single file). New/changed units: + +``` +musicfetch +├── _split_query(query) -> (left, right|None) # split on first " - " +├── musicbrainz_best_album(artist, track) -> dict|None +│ # MB recording search -> best release-group {album_title, artist, year, rg_mbid} +├── _similar(a, b) -> float # difflib ratio, casefolded +├── _score_album_hit(hit, want_artist, want_album, rg_mbid) -> float +└── lidarr_search(query, limit) -> list[Hit] # REWRITTEN: scored, best-first +``` + +### Data flow + +1. **`Artist - Track` query:** + a. `musicbrainz_best_album(artist, track)` → album candidate (title, artist, + year, release-group MBID). + b. Lidarr `GET /api/v1/album/lookup?term=" "` → map to `Hit`s. + c. Score each: `0.7*_similar(want_artist, hit.artist) + 0.3*_similar(want_album, + hit.album)`, plus a strong bonus (e.g. +0.5) if `hit.payload.album.foreignAlbumId + == rg_mbid`. Sort desc. + d. Enrich `Hit.year` from MB when the Lidarr hit lacks one. +2. **Single-term query (no ` - `):** Lidarr `/album/lookup` + `/artist/lookup` + with the raw term; score each against the whole query (artist hits scored on + artist name, album hits on artist+album); merge, sort desc. +3. **Fallbacks (never regress):** if MB times out / returns nothing, skip to step + 2 using `(artist, track)` recombined as the term. If `/album/lookup` and + `/artist/lookup` both fail, fall back to the existing `/api/v1/search` path. + `lidarr_search` returns `[]` only when everything fails or the key is missing. + +### MusicBrainz client details + +- Endpoint: `https://musicbrainz.org/ws/2/recording?query=&fmt=json&limit=10` + where lucene = `artist:"" AND recording:""`. +- Headers: `User-Agent: musicfetch/2.0 (https://github.com/…)` (MB requires a + descriptive UA). Timeout ~8s. Rate-limit: at most ~1 request/sec (a process-level + min-interval guard; this tool makes one call per fetch so it's effectively a + courtesy delay). +- **Release-group selection** from the returned recordings' releases: + prefer `primary-type == "Album"` with **no** `secondary-types` (excludes + Compilation, Live, Single, Soundtrack); among those choose the earliest + `first-release-date`. Fall back to any release-group if none qualify. Return + `{album_title, artist, year, rg_mbid}` or `None`. + +## YouTube Fallback (unchanged, documented) + +This feature does not alter fallback behavior: +- **`source=auto` (default):** `build_combined_hits` includes YouTube hits. If + Lidarr times out or returns no results, `lidarr_search` returns `[]` and the top + YouTube hit is picked. If a Lidarr album is picked but has no indexer release, + `actions.perform_fetch` falls through to the top YouTube hit. +- **`source=lidarr`:** lidarr-only by design — **no** YouTube fallback (the + explicit "force Lidarr" switch). Unchanged. + +## Error Handling + +- All MB and Lidarr HTTP calls are wrapped; exceptions/timeouts are caught and + degrade to the next fallback tier. `lidarr_search` never raises. +- Empty/garbled MB JSON → treated as no match. +- Existing `DEBUG` logging extended to show MB query, chosen release-group, and + top scored candidates. + +## Testing + +Unit tests (mock `requests`, no live network): +- `musicbrainz_best_album`: from canned MB JSON, picks studio Album over a single + and a compilation; picks earliest among Albums; returns `None` on empty. +- `_similar` / `_score_album_hit`: real *Discovery* by Daft Punk outscores the + *Pignickel* novelty for query `Daft Punk - Discovery`-style candidates; MBID + match bonus dominates. +- `_split_query`: `"A - B"` → `("A","B")`; no dash → `("A", None)`; only first + ` - ` splits. +- Fallback: MB failure → Lidarr lookup path; lookup failure → `/search`. + +Manual live check (end of implementation): with the API pointed at the user's +Lidarr (`10.2.1.16:8686`), `POST /fetch?q=Daft Punk - Harder Better Faster +Stronger&source=lidarr` resolves to **Discovery** by Daft Punk (not a single, +compilation, or novelty), and the interactive-release flow proceeds. + +## Out of Scope (YAGNI) + +Caching MB responses, multi-track/album disambiguation UI, configurable scoring +weights, fuzzy artist aliasing beyond difflib, MB cover-art lookup.