Files
musicfetch/docs/superpowers/specs/2026-06-08-smarter-lidarr-matching-design.md
zebra 45121dd807 Plan smarter Lidarr matching via exact MBID lookup
Drop fuzzy difflib scoring: MusicBrainz resolves track->album release-group
MBID, Lidarr album/lookup?term=mbid:<id> returns the exact album. Live-verified
against the user's Lidarr.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 21:06:39 -07:00

6.3 KiB

Smarter Lidarr Matching — Design

Date: 2026-06-08 Status: Approved

Context & Goal

Live testing of the REST API exposed a real weakness: musicfetch's lidarr_search trusts Lidarr's universal /api/v1/search ordering, which is fuzzy and unranked. A query of Daft Punk - Discovery ranked a novelty remix ("Daft Punk's Discovery but it's in the SM64 Soundfont" by Pignickel) #1, and the real Discovery by Daft Punk wasn't even top-5. The interactive CLI picker lets a human work around this; the API's noninteractive top-pick cannot and grabs garbage.

The real input shape is Shazam-style Artist - Track. Lidarr only grabs albums, never single tracks, so we must resolve a track to the album that contains it, then pick the best-matching Lidarr album.

Goal: make lidarr_search return a scored, best-first list of Lidarr hits so the noninteractive API picks the correct album, and the CLI picker shows good matches first. Resolve Artist - Track → album via MusicBrainz.

Decisions (confirmed with user)

  • Fix in the shared musicfetch.lidarr_search (not an API-only layer) — both the CLI picker and the API noninteractive pick benefit; no duplicated logic. Signature unchanged: lidarr_search(query, limit) -> list[Hit] (drop-in).
  • Resolve track → album via MusicBrainz (the same upstream Lidarr uses). Lidarr's own track indexing is too weak. One extra HTTP call, no API key.
  • Track-first semantics (Artist - Track): the right side is treated as a track to resolve to its album. (YouTube path already handles exact tracks; this makes Lidarr the accurate album/discography source.)
  • No fuzzy scoring. Live-verified that Lidarr's album/lookup accepts a direct MusicBrainz id: term=mbid:<release-group-mbid> (also term=lidarr:<mbid>) returns exactly one album. So we resolve the album's MBID via MusicBrainz and ask Lidarr for that exact MBID — no difflib, no ranking heuristics. The only selection is deterministic release-group type-filtering inside the MusicBrainz step (prefer studio Album over single/comp/live).
  • YouTube fallback preserved exactly as today (see below).

Architecture

All changes live in the musicfetch binary (single file). New/changed units:

musicfetch
├── _split_query(query) -> (left, right|None)        # split on first " - "
├── musicbrainz_best_album(artist, track) -> dict|None
│       # MB recording search -> best release-group {album_title, artist, year, rg_mbid}
├── _lidarr_album_candidates(term) / _lidarr_artist_candidates(term) -> list[Hit]
├── _universal_search(query, limit) -> list[Hit]     # /api/v1/search last resort
└── lidarr_search(query, limit) -> list[Hit]         # REWRITTEN: MBID-exact + fallbacks

Data flow

  1. Artist - Track query: a. musicbrainz_best_album(artist, track){album_title, artist, year, rg_mbid}. b. Lidarr GET /api/v1/album/lookup?term=mbid:<rg_mbid> → 0 or 1 exact album → Hit. c. Enrich Hit.year from MB when the Lidarr hit lacks one. Return it.
  2. Single-term query (no -): _fallback_lookup — artist-first concatenation of /artist/lookup + /album/lookup for the raw term (a bare term is most often an artist). No scoring; the interactive picker / noninteractive top-pick consume the order.
  3. Fallbacks (never regress): if MusicBrainz misses or the exact MBID lookup returns nothing, use _fallback_lookup(query) (album-first there, since a dash query named an album/track). If /album/lookup and /artist/lookup both yield nothing, fall back to the existing /api/v1/search. lidarr_search returns [] only when everything fails or the key is missing.

MusicBrainz client details

  • Endpoint: https://musicbrainz.org/ws/2/recording?query=<lucene>&fmt=json&limit=10 where lucene = artist:"<artist>" AND recording:"<track>".
  • Headers: User-Agent: musicfetch/2.0 (https://github.com/…) (MB requires a descriptive UA). Timeout ~8s. Rate-limit: at most ~1 request/sec (a process-level min-interval guard; this tool makes one call per fetch so it's effectively a courtesy delay).
  • Release-group selection from the returned recordings' releases: prefer primary-type == "Album" with no secondary-types (excludes Compilation, Live, Single, Soundtrack); among those choose the earliest first-release-date. Fall back to any release-group if none qualify. Return {album_title, artist, year, rg_mbid} or None.

YouTube Fallback (unchanged, documented)

This feature does not alter fallback behavior:

  • source=auto (default): build_combined_hits includes YouTube hits. If Lidarr times out or returns no results, lidarr_search returns [] and the top YouTube hit is picked. If a Lidarr album is picked but has no indexer release, actions.perform_fetch falls through to the top YouTube hit.
  • source=lidarr: lidarr-only by design — no YouTube fallback (the explicit "force Lidarr" switch). Unchanged.

Error Handling

  • All MB and Lidarr HTTP calls are wrapped; exceptions/timeouts are caught and degrade to the next fallback tier. lidarr_search never raises.
  • Empty/garbled MB JSON → treated as no match.
  • Existing DEBUG logging extended to show MB query and chosen release-group.

Testing

Unit tests (mock requests, no live network):

  • musicbrainz_best_album: from canned MB JSON, picks studio Album over a single and a compilation; picks earliest among Albums; falls back to any release-group when no studio exists; returns None on empty/exception.
  • _split_query: "A - B"("A","B"); no dash → ("A", None); only first - splits.
  • lidarr_search: Artist - Track resolves via MB then does an mbid: exact lookup returning the real album (year enriched from MB); MB miss → fallback lookup; fallback empty → /api/v1/search; no key → [].

Manual live check (end of implementation): with the API pointed at the user's Lidarr (10.2.1.16:8686), lidarr_search("Daft Punk - Harder Better Faster Stronger") resolves to Discovery by Daft Punk (the exact MBID 48117b90-a16e-34ca-a514-19c702df1158), not a single/compilation/novelty.

Out of Scope (YAGNI)

Caching MB responses, multi-track/album disambiguation UI, fuzzy similarity scoring (eliminated by the mbid: exact lookup), MB cover-art lookup.