Add design spec for smarter Lidarr matching
Scored best-first lidarr_search with MusicBrainz track->album resolution, difflib scoring, preserved YouTube fallback. Fixes noninteractive API picking junk (Pignickel) over the real album. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,121 @@
|
|||||||
|
# Smarter Lidarr Matching — Design
|
||||||
|
|
||||||
|
**Date:** 2026-06-08
|
||||||
|
**Status:** Approved
|
||||||
|
|
||||||
|
## Context & Goal
|
||||||
|
|
||||||
|
Live testing of the REST API exposed a real weakness: `musicfetch`'s
|
||||||
|
`lidarr_search` trusts Lidarr's universal `/api/v1/search` ordering, which is
|
||||||
|
fuzzy and unranked. A query of `Daft Punk - Discovery` ranked a novelty remix
|
||||||
|
("Daft Punk's Discovery but it's in the SM64 Soundfont" by *Pignickel*) #1, and
|
||||||
|
the real *Discovery* by Daft Punk wasn't even top-5. The interactive CLI picker
|
||||||
|
lets a human work around this; the **API's noninteractive top-pick cannot** and
|
||||||
|
grabs garbage.
|
||||||
|
|
||||||
|
The real input shape is Shazam-style `Artist - Track`. Lidarr only grabs
|
||||||
|
**albums**, never single tracks, so we must resolve a track to the album that
|
||||||
|
contains it, then pick the best-matching Lidarr album.
|
||||||
|
|
||||||
|
**Goal:** make `lidarr_search` return a **scored, best-first** list of Lidarr
|
||||||
|
hits so the noninteractive API picks the correct album, and the CLI picker shows
|
||||||
|
good matches first. Resolve `Artist - Track` → album via MusicBrainz.
|
||||||
|
|
||||||
|
## Decisions (confirmed with user)
|
||||||
|
|
||||||
|
- **Fix in the shared `musicfetch.lidarr_search`** (not an API-only layer) — both
|
||||||
|
the CLI picker and the API noninteractive pick benefit; no duplicated logic.
|
||||||
|
Signature unchanged: `lidarr_search(query, limit) -> list[Hit]` (drop-in).
|
||||||
|
- **Resolve track → album via MusicBrainz** (the same upstream Lidarr uses).
|
||||||
|
Lidarr's own track indexing is too weak. One extra HTTP call, no API key.
|
||||||
|
- **Track-first semantics** (`Artist - Track`): the right side is treated as a
|
||||||
|
track to resolve to its album. (YouTube path already handles exact tracks; this
|
||||||
|
makes Lidarr the accurate album/discography source.)
|
||||||
|
- **Scoring** with stdlib `difflib` (no new dependency).
|
||||||
|
- **YouTube fallback preserved** exactly as today (see below).
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
All changes live in the `musicfetch` binary (single file). New/changed units:
|
||||||
|
|
||||||
|
```
|
||||||
|
musicfetch
|
||||||
|
├── _split_query(query) -> (left, right|None) # split on first " - "
|
||||||
|
├── musicbrainz_best_album(artist, track) -> dict|None
|
||||||
|
│ # MB recording search -> best release-group {album_title, artist, year, rg_mbid}
|
||||||
|
├── _similar(a, b) -> float # difflib ratio, casefolded
|
||||||
|
├── _score_album_hit(hit, want_artist, want_album, rg_mbid) -> float
|
||||||
|
└── lidarr_search(query, limit) -> list[Hit] # REWRITTEN: scored, best-first
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data flow
|
||||||
|
|
||||||
|
1. **`Artist - Track` query:**
|
||||||
|
a. `musicbrainz_best_album(artist, track)` → album candidate (title, artist,
|
||||||
|
year, release-group MBID).
|
||||||
|
b. Lidarr `GET /api/v1/album/lookup?term="<artist> <album>"` → map to `Hit`s.
|
||||||
|
c. Score each: `0.7*_similar(want_artist, hit.artist) + 0.3*_similar(want_album,
|
||||||
|
hit.album)`, plus a strong bonus (e.g. +0.5) if `hit.payload.album.foreignAlbumId
|
||||||
|
== rg_mbid`. Sort desc.
|
||||||
|
d. Enrich `Hit.year` from MB when the Lidarr hit lacks one.
|
||||||
|
2. **Single-term query (no ` - `):** Lidarr `/album/lookup` + `/artist/lookup`
|
||||||
|
with the raw term; score each against the whole query (artist hits scored on
|
||||||
|
artist name, album hits on artist+album); merge, sort desc.
|
||||||
|
3. **Fallbacks (never regress):** if MB times out / returns nothing, skip to step
|
||||||
|
2 using `(artist, track)` recombined as the term. If `/album/lookup` and
|
||||||
|
`/artist/lookup` both fail, fall back to the existing `/api/v1/search` path.
|
||||||
|
`lidarr_search` returns `[]` only when everything fails or the key is missing.
|
||||||
|
|
||||||
|
### MusicBrainz client details
|
||||||
|
|
||||||
|
- Endpoint: `https://musicbrainz.org/ws/2/recording?query=<lucene>&fmt=json&limit=10`
|
||||||
|
where lucene = `artist:"<artist>" AND recording:"<track>"`.
|
||||||
|
- Headers: `User-Agent: musicfetch/2.0 (https://github.com/…)` (MB requires a
|
||||||
|
descriptive UA). Timeout ~8s. Rate-limit: at most ~1 request/sec (a process-level
|
||||||
|
min-interval guard; this tool makes one call per fetch so it's effectively a
|
||||||
|
courtesy delay).
|
||||||
|
- **Release-group selection** from the returned recordings' releases:
|
||||||
|
prefer `primary-type == "Album"` with **no** `secondary-types` (excludes
|
||||||
|
Compilation, Live, Single, Soundtrack); among those choose the earliest
|
||||||
|
`first-release-date`. Fall back to any release-group if none qualify. Return
|
||||||
|
`{album_title, artist, year, rg_mbid}` or `None`.
|
||||||
|
|
||||||
|
## YouTube Fallback (unchanged, documented)
|
||||||
|
|
||||||
|
This feature does not alter fallback behavior:
|
||||||
|
- **`source=auto` (default):** `build_combined_hits` includes YouTube hits. If
|
||||||
|
Lidarr times out or returns no results, `lidarr_search` returns `[]` and the top
|
||||||
|
YouTube hit is picked. If a Lidarr album is picked but has no indexer release,
|
||||||
|
`actions.perform_fetch` falls through to the top YouTube hit.
|
||||||
|
- **`source=lidarr`:** lidarr-only by design — **no** YouTube fallback (the
|
||||||
|
explicit "force Lidarr" switch). Unchanged.
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
- All MB and Lidarr HTTP calls are wrapped; exceptions/timeouts are caught and
|
||||||
|
degrade to the next fallback tier. `lidarr_search` never raises.
|
||||||
|
- Empty/garbled MB JSON → treated as no match.
|
||||||
|
- Existing `DEBUG` logging extended to show MB query, chosen release-group, and
|
||||||
|
top scored candidates.
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
Unit tests (mock `requests`, no live network):
|
||||||
|
- `musicbrainz_best_album`: from canned MB JSON, picks studio Album over a single
|
||||||
|
and a compilation; picks earliest among Albums; returns `None` on empty.
|
||||||
|
- `_similar` / `_score_album_hit`: real *Discovery* by Daft Punk outscores the
|
||||||
|
*Pignickel* novelty for query `Daft Punk - Discovery`-style candidates; MBID
|
||||||
|
match bonus dominates.
|
||||||
|
- `_split_query`: `"A - B"` → `("A","B")`; no dash → `("A", None)`; only first
|
||||||
|
` - ` splits.
|
||||||
|
- Fallback: MB failure → Lidarr lookup path; lookup failure → `/search`.
|
||||||
|
|
||||||
|
Manual live check (end of implementation): with the API pointed at the user's
|
||||||
|
Lidarr (`10.2.1.16:8686`), `POST /fetch?q=Daft Punk - Harder Better Faster
|
||||||
|
Stronger&source=lidarr` resolves to **Discovery** by Daft Punk (not a single,
|
||||||
|
compilation, or novelty), and the interactive-release flow proceeds.
|
||||||
|
|
||||||
|
## Out of Scope (YAGNI)
|
||||||
|
|
||||||
|
Caching MB responses, multi-track/album disambiguation UI, configurable scoring
|
||||||
|
weights, fuzzy artist aliasing beyond difflib, MB cover-art lookup.
|
||||||
Reference in New Issue
Block a user