Files

zebra 6687a5a0fc Add design spec for smarter Lidarr matching

Scored best-first lidarr_search with MusicBrainz track->album resolution,
difflib scoring, preserved YouTube fallback. Fixes noninteractive API
picking junk (Pignickel) over the real album.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 20:51:15 -07:00

6.0 KiB

Raw Blame History

Smarter Lidarr Matching — Design

Date: 2026-06-08 Status: Approved

Context & Goal

Live testing of the REST API exposed a real weakness: musicfetch's lidarr_search trusts Lidarr's universal /api/v1/search ordering, which is fuzzy and unranked. A query of Daft Punk - Discovery ranked a novelty remix ("Daft Punk's Discovery but it's in the SM64 Soundfont" by Pignickel) #1, and the real Discovery by Daft Punk wasn't even top-5. The interactive CLI picker lets a human work around this; the API's noninteractive top-pick cannot and grabs garbage.

The real input shape is Shazam-style Artist - Track. Lidarr only grabs albums, never single tracks, so we must resolve a track to the album that contains it, then pick the best-matching Lidarr album.

Goal: make lidarr_search return a scored, best-first list of Lidarr hits so the noninteractive API picks the correct album, and the CLI picker shows good matches first. Resolve Artist - Track → album via MusicBrainz.

Decisions (confirmed with user)

Fix in the shared musicfetch.lidarr_search (not an API-only layer) — both the CLI picker and the API noninteractive pick benefit; no duplicated logic. Signature unchanged: lidarr_search(query, limit) -> list[Hit] (drop-in).
Resolve track → album via MusicBrainz (the same upstream Lidarr uses). Lidarr's own track indexing is too weak. One extra HTTP call, no API key.
Track-first semantics (Artist - Track): the right side is treated as a track to resolve to its album. (YouTube path already handles exact tracks; this makes Lidarr the accurate album/discography source.)
Scoring with stdlib difflib (no new dependency).
YouTube fallback preserved exactly as today (see below).

Architecture

All changes live in the musicfetch binary (single file). New/changed units:

musicfetch
├── _split_query(query) -> (left, right|None)        # split on first " - "
├── musicbrainz_best_album(artist, track) -> dict|None
│       # MB recording search -> best release-group {album_title, artist, year, rg_mbid}
├── _similar(a, b) -> float                           # difflib ratio, casefolded
├── _score_album_hit(hit, want_artist, want_album, rg_mbid) -> float
└── lidarr_search(query, limit) -> list[Hit]          # REWRITTEN: scored, best-first

Data flow

Artist - Track query: a. musicbrainz_best_album(artist, track) → album candidate (title, artist, year, release-group MBID). b. Lidarr GET /api/v1/album/lookup?term="<artist> <album>" → map to Hits. c. Score each: 0.7*_similar(want_artist, hit.artist) + 0.3*_similar(want_album, hit.album), plus a strong bonus (e.g. +0.5) if hit.payload.album.foreignAlbumId == rg_mbid. Sort desc. d. Enrich Hit.year from MB when the Lidarr hit lacks one.
Single-term query (no -): Lidarr /album/lookup + /artist/lookup with the raw term; score each against the whole query (artist hits scored on artist name, album hits on artist+album); merge, sort desc.
Fallbacks (never regress): if MB times out / returns nothing, skip to step 2 using (artist, track) recombined as the term. If /album/lookup and /artist/lookup both fail, fall back to the existing /api/v1/search path. lidarr_search returns [] only when everything fails or the key is missing.

MusicBrainz client details

Endpoint: https://musicbrainz.org/ws/2/recording?query=<lucene>&fmt=json&limit=10 where lucene = artist:"<artist>" AND recording:"<track>".
Headers: User-Agent: musicfetch/2.0 (https://github.com/…) (MB requires a descriptive UA). Timeout ~8s. Rate-limit: at most ~1 request/sec (a process-level min-interval guard; this tool makes one call per fetch so it's effectively a courtesy delay).
Release-group selection from the returned recordings' releases: prefer primary-type == "Album" with no secondary-types (excludes Compilation, Live, Single, Soundtrack); among those choose the earliest first-release-date. Fall back to any release-group if none qualify. Return {album_title, artist, year, rg_mbid} or None.

YouTube Fallback (unchanged, documented)

This feature does not alter fallback behavior:

source=auto (default): build_combined_hits includes YouTube hits. If Lidarr times out or returns no results, lidarr_search returns [] and the top YouTube hit is picked. If a Lidarr album is picked but has no indexer release, actions.perform_fetch falls through to the top YouTube hit.
source=lidarr: lidarr-only by design — no YouTube fallback (the explicit "force Lidarr" switch). Unchanged.

Error Handling

All MB and Lidarr HTTP calls are wrapped; exceptions/timeouts are caught and degrade to the next fallback tier. lidarr_search never raises.
Empty/garbled MB JSON → treated as no match.
Existing DEBUG logging extended to show MB query, chosen release-group, and top scored candidates.

Testing

Unit tests (mock requests, no live network):

musicbrainz_best_album: from canned MB JSON, picks studio Album over a single and a compilation; picks earliest among Albums; returns None on empty.
_similar / _score_album_hit: real Discovery by Daft Punk outscores the Pignickel novelty for query Daft Punk - Discovery-style candidates; MBID match bonus dominates.
_split_query: "A - B" → ("A","B"); no dash → ("A", None); only first - splits.
Fallback: MB failure → Lidarr lookup path; lookup failure → /search.

Manual live check (end of implementation): with the API pointed at the user's Lidarr (10.2.1.16:8686), POST /fetch?q=Daft Punk - Harder Better Faster Stronger&source=lidarr resolves to Discovery by Daft Punk (not a single, compilation, or novelty), and the interactive-release flow proceeds.

Out of Scope (YAGNI)

Caching MB responses, multi-track/album disambiguation UI, configurable scoring weights, fuzzy artist aliasing beyond difflib, MB cover-art lookup.

6.0 KiB Raw Blame History