Scored best-first lidarr_search with MusicBrainz track->album resolution, difflib scoring, preserved YouTube fallback. Fixes noninteractive API picking junk (Pignickel) over the real album. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.0 KiB
Smarter Lidarr Matching — Design
Date: 2026-06-08 Status: Approved
Context & Goal
Live testing of the REST API exposed a real weakness: musicfetch's
lidarr_search trusts Lidarr's universal /api/v1/search ordering, which is
fuzzy and unranked. A query of Daft Punk - Discovery ranked a novelty remix
("Daft Punk's Discovery but it's in the SM64 Soundfont" by Pignickel) #1, and
the real Discovery by Daft Punk wasn't even top-5. The interactive CLI picker
lets a human work around this; the API's noninteractive top-pick cannot and
grabs garbage.
The real input shape is Shazam-style Artist - Track. Lidarr only grabs
albums, never single tracks, so we must resolve a track to the album that
contains it, then pick the best-matching Lidarr album.
Goal: make lidarr_search return a scored, best-first list of Lidarr
hits so the noninteractive API picks the correct album, and the CLI picker shows
good matches first. Resolve Artist - Track → album via MusicBrainz.
Decisions (confirmed with user)
- Fix in the shared
musicfetch.lidarr_search(not an API-only layer) — both the CLI picker and the API noninteractive pick benefit; no duplicated logic. Signature unchanged:lidarr_search(query, limit) -> list[Hit](drop-in). - Resolve track → album via MusicBrainz (the same upstream Lidarr uses). Lidarr's own track indexing is too weak. One extra HTTP call, no API key.
- Track-first semantics (
Artist - Track): the right side is treated as a track to resolve to its album. (YouTube path already handles exact tracks; this makes Lidarr the accurate album/discography source.) - Scoring with stdlib
difflib(no new dependency). - YouTube fallback preserved exactly as today (see below).
Architecture
All changes live in the musicfetch binary (single file). New/changed units:
musicfetch
├── _split_query(query) -> (left, right|None) # split on first " - "
├── musicbrainz_best_album(artist, track) -> dict|None
│ # MB recording search -> best release-group {album_title, artist, year, rg_mbid}
├── _similar(a, b) -> float # difflib ratio, casefolded
├── _score_album_hit(hit, want_artist, want_album, rg_mbid) -> float
└── lidarr_search(query, limit) -> list[Hit] # REWRITTEN: scored, best-first
Data flow
Artist - Trackquery: a.musicbrainz_best_album(artist, track)→ album candidate (title, artist, year, release-group MBID). b. LidarrGET /api/v1/album/lookup?term="<artist> <album>"→ map toHits. c. Score each:0.7*_similar(want_artist, hit.artist) + 0.3*_similar(want_album, hit.album), plus a strong bonus (e.g. +0.5) ifhit.payload.album.foreignAlbumId == rg_mbid. Sort desc. d. EnrichHit.yearfrom MB when the Lidarr hit lacks one.- Single-term query (no
-): Lidarr/album/lookup+/artist/lookupwith the raw term; score each against the whole query (artist hits scored on artist name, album hits on artist+album); merge, sort desc. - Fallbacks (never regress): if MB times out / returns nothing, skip to step
2 using
(artist, track)recombined as the term. If/album/lookupand/artist/lookupboth fail, fall back to the existing/api/v1/searchpath.lidarr_searchreturns[]only when everything fails or the key is missing.
MusicBrainz client details
- Endpoint:
https://musicbrainz.org/ws/2/recording?query=<lucene>&fmt=json&limit=10where lucene =artist:"<artist>" AND recording:"<track>". - Headers:
User-Agent: musicfetch/2.0 (https://github.com/…)(MB requires a descriptive UA). Timeout ~8s. Rate-limit: at most ~1 request/sec (a process-level min-interval guard; this tool makes one call per fetch so it's effectively a courtesy delay). - Release-group selection from the returned recordings' releases:
prefer
primary-type == "Album"with nosecondary-types(excludes Compilation, Live, Single, Soundtrack); among those choose the earliestfirst-release-date. Fall back to any release-group if none qualify. Return{album_title, artist, year, rg_mbid}orNone.
YouTube Fallback (unchanged, documented)
This feature does not alter fallback behavior:
source=auto(default):build_combined_hitsincludes YouTube hits. If Lidarr times out or returns no results,lidarr_searchreturns[]and the top YouTube hit is picked. If a Lidarr album is picked but has no indexer release,actions.perform_fetchfalls through to the top YouTube hit.source=lidarr: lidarr-only by design — no YouTube fallback (the explicit "force Lidarr" switch). Unchanged.
Error Handling
- All MB and Lidarr HTTP calls are wrapped; exceptions/timeouts are caught and
degrade to the next fallback tier.
lidarr_searchnever raises. - Empty/garbled MB JSON → treated as no match.
- Existing
DEBUGlogging extended to show MB query, chosen release-group, and top scored candidates.
Testing
Unit tests (mock requests, no live network):
musicbrainz_best_album: from canned MB JSON, picks studio Album over a single and a compilation; picks earliest among Albums; returnsNoneon empty._similar/_score_album_hit: real Discovery by Daft Punk outscores the Pignickel novelty for queryDaft Punk - Discovery-style candidates; MBID match bonus dominates._split_query:"A - B"→("A","B"); no dash →("A", None); only first-splits.- Fallback: MB failure → Lidarr lookup path; lookup failure →
/search.
Manual live check (end of implementation): with the API pointed at the user's
Lidarr (10.2.1.16:8686), POST /fetch?q=Daft Punk - Harder Better Faster Stronger&source=lidarr resolves to Discovery by Daft Punk (not a single,
compilation, or novelty), and the interactive-release flow proceeds.
Out of Scope (YAGNI)
Caching MB responses, multi-track/album disambiguation UI, configurable scoring weights, fuzzy artist aliasing beyond difflib, MB cover-art lookup.