Plan smarter Lidarr matching via exact MBID lookup

Drop fuzzy difflib scoring: MusicBrainz resolves track->album release-group
MBID, Lidarr album/lookup?term=mbid:<id> returns the exact album. Live-verified
against the user's Lidarr.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-08 21:06:39 -07:00
parent 6687a5a0fc
commit 45121dd807
2 changed files with 546 additions and 30 deletions

View File

@@ -31,7 +31,12 @@ good matches first. Resolve `Artist - Track` → album via MusicBrainz.
- **Track-first semantics** (`Artist - Track`): the right side is treated as a
track to resolve to its album. (YouTube path already handles exact tracks; this
makes Lidarr the accurate album/discography source.)
- **Scoring** with stdlib `difflib` (no new dependency).
- **No fuzzy scoring.** Live-verified that Lidarr's `album/lookup` accepts a
direct MusicBrainz id: `term=mbid:<release-group-mbid>` (also `term=lidarr:<mbid>`)
returns **exactly one** album. So we resolve the album's MBID via MusicBrainz and
ask Lidarr for that exact MBID — no difflib, no ranking heuristics. The only
selection is deterministic release-group type-filtering inside the MusicBrainz
step (prefer studio Album over single/comp/live).
- **YouTube fallback preserved** exactly as today (see below).
## Architecture
@@ -43,28 +48,26 @@ musicfetch
├── _split_query(query) -> (left, right|None) # split on first " - "
├── musicbrainz_best_album(artist, track) -> dict|None
│ # MB recording search -> best release-group {album_title, artist, year, rg_mbid}
├── _similar(a, b) -> float # difflib ratio, casefolded
├── _score_album_hit(hit, want_artist, want_album, rg_mbid) -> float
└── lidarr_search(query, limit) -> list[Hit] # REWRITTEN: scored, best-first
├── _lidarr_album_candidates(term) / _lidarr_artist_candidates(term) -> list[Hit]
├── _universal_search(query, limit) -> list[Hit] # /api/v1/search last resort
└── lidarr_search(query, limit) -> list[Hit] # REWRITTEN: MBID-exact + fallbacks
```
### Data flow
1. **`Artist - Track` query:**
a. `musicbrainz_best_album(artist, track)` → album candidate (title, artist,
year, release-group MBID).
b. Lidarr `GET /api/v1/album/lookup?term="<artist> <album>"` → map to `Hit`s.
c. Score each: `0.7*_similar(want_artist, hit.artist) + 0.3*_similar(want_album,
hit.album)`, plus a strong bonus (e.g. +0.5) if `hit.payload.album.foreignAlbumId
== rg_mbid`. Sort desc.
d. Enrich `Hit.year` from MB when the Lidarr hit lacks one.
2. **Single-term query (no ` - `):** Lidarr `/album/lookup` + `/artist/lookup`
with the raw term; score each against the whole query (artist hits scored on
artist name, album hits on artist+album); merge, sort desc.
3. **Fallbacks (never regress):** if MB times out / returns nothing, skip to step
2 using `(artist, track)` recombined as the term. If `/album/lookup` and
`/artist/lookup` both fail, fall back to the existing `/api/v1/search` path.
`lidarr_search` returns `[]` only when everything fails or the key is missing.
a. `musicbrainz_best_album(artist, track)``{album_title, artist, year, rg_mbid}`.
b. Lidarr `GET /api/v1/album/lookup?term=mbid:<rg_mbid>` → 0 or 1 exact album → `Hit`.
c. Enrich `Hit.year` from MB when the Lidarr hit lacks one. Return it.
2. **Single-term query (no ` - `):** `_fallback_lookup` — artist-first concatenation
of `/artist/lookup` + `/album/lookup` for the raw term (a bare term is most often
an artist). No scoring; the interactive picker / noninteractive top-pick consume
the order.
3. **Fallbacks (never regress):** if MusicBrainz misses or the exact MBID lookup
returns nothing, use `_fallback_lookup(query)` (album-first there, since a dash
query named an album/track). If `/album/lookup` and `/artist/lookup` both yield
nothing, fall back to the existing `/api/v1/search`. `lidarr_search` returns `[]`
only when everything fails or the key is missing.
### MusicBrainz client details
@@ -95,27 +98,26 @@ This feature does not alter fallback behavior:
- All MB and Lidarr HTTP calls are wrapped; exceptions/timeouts are caught and
degrade to the next fallback tier. `lidarr_search` never raises.
- Empty/garbled MB JSON → treated as no match.
- Existing `DEBUG` logging extended to show MB query, chosen release-group, and
top scored candidates.
- Existing `DEBUG` logging extended to show MB query and chosen release-group.
## Testing
Unit tests (mock `requests`, no live network):
- `musicbrainz_best_album`: from canned MB JSON, picks studio Album over a single
and a compilation; picks earliest among Albums; returns `None` on empty.
- `_similar` / `_score_album_hit`: real *Discovery* by Daft Punk outscores the
*Pignickel* novelty for query `Daft Punk - Discovery`-style candidates; MBID
match bonus dominates.
and a compilation; picks earliest among Albums; falls back to any release-group
when no studio exists; returns `None` on empty/exception.
- `_split_query`: `"A - B"``("A","B")`; no dash → `("A", None)`; only first
` - ` splits.
- Fallback: MB failure → Lidarr lookup path; lookup failure → `/search`.
- `lidarr_search`: `Artist - Track` resolves via MB then does an `mbid:` exact
lookup returning the real album (year enriched from MB); MB miss → fallback
lookup; fallback empty → `/api/v1/search`; no key → `[]`.
Manual live check (end of implementation): with the API pointed at the user's
Lidarr (`10.2.1.16:8686`), `POST /fetch?q=Daft Punk - Harder Better Faster
Stronger&source=lidarr` resolves to **Discovery** by Daft Punk (not a single,
compilation, or novelty), and the interactive-release flow proceeds.
Lidarr (`10.2.1.16:8686`), `lidarr_search("Daft Punk - Harder Better Faster
Stronger")` resolves to **Discovery** by Daft Punk (the exact MBID
`48117b90-a16e-34ca-a514-19c702df1158`), not a single/compilation/novelty.
## Out of Scope (YAGNI)
Caching MB responses, multi-track/album disambiguation UI, configurable scoring
weights, fuzzy artist aliasing beyond difflib, MB cover-art lookup.
Caching MB responses, multi-track/album disambiguation UI, fuzzy similarity
scoring (eliminated by the `mbid:` exact lookup), MB cover-art lookup.