Files

zebra 45121dd807 Plan smarter Lidarr matching via exact MBID lookup

Drop fuzzy difflib scoring: MusicBrainz resolves track->album release-group
MBID, Lidarr album/lookup?term=mbid:<id> returns the exact album. Live-verified
against the user's Lidarr.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-08 21:06:39 -07:00

22 KiB

Raw Blame History

Smarter Lidarr Matching Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Make musicfetch.lidarr_search resolve a Shazam-style Artist - Track to the correct album by asking MusicBrainz for the studio album's release-group MBID, then doing an exact Lidarr lookup (album/lookup?term=mbid:<MBID>) — so the noninteractive API picks the real album (Daft Punk Discovery) instead of junk (Pignickel novelty), with no fuzzy ranking system.

Architecture: All changes are in the single-file musicfetch binary (the shared search used by both the CLI picker and the REST API). New helpers _split_query and musicbrainz_best_album, plus a rewritten lidarr_search with small lookup helpers and tiered fallbacks. Tests import the binary as a module via the existing server.mf loader (which registers it in sys.modules as musicfetch_core).

Tech Stack: Python 3.10+, stdlib time, requests (already a dep), pytest with monkeypatch. No new dependencies. Live-validated against MusicBrainz + the user's Lidarr 3.1.0 — album/lookup?term=mbid:48117b90-a16e-34ca-a514-19c702df1158 returns exactly Discovery — Daft Punk.

Context for the implementer

musicfetch is an executable Python file (no .py ext) at the repo root. Relevant existing pieces:

Hit dataclass: fields source, kind, title, artist, album, year, thumbnail, payload.
_album_to_hit(album) → Hit(source="lidarr", kind="album", ..., payload={"album": album}). The raw Lidarr album dict carries foreignAlbumId (MusicBrainz release-group MBID) and releaseDate.
_artist_to_hit(artist) → Hit(source="lidarr", kind="artist", ...).
lidarr_get(path, params=None, timeout=15) → GET helper, raises on HTTP error.
API_KEY, dbg(...), err(...), module-level requests, from requests.exceptions import RequestException, Timeout.
Current lidarr_search(query, limit) at lines ~129-162 trusts /api/v1/search ordering then falls back to /album/lookup + /artist/lookup. This is what we replace.

Why MusicBrainz is still required: Lidarr has no track-search endpoint; album/lookup only matches albums/artists. Shazam gives Artist - Track, and the track name won't match the album title in Lidarr. MusicBrainz recording search maps track → album, and gives us the release-group MBID that Lidarr's mbid: lookup resolves exactly. No scoring needed.

Don't break callers: lidarr_search(query, limit) -> list[Hit] signature stays identical. build_combined_hits and the API depend on it returning [] on failure (so the YouTube fallback works).

Tests access the binary like this (top of each new test module):

import server.mf  # noqa: F401 — loads musicfetch and registers musicfetch_core in sys.modules
import musicfetch_core as mf

Set mf.API_KEY via monkeypatch.setattr(mf, "API_KEY", "testkey") where needed.

One import to add to the top imports block of musicfetch (Task 2): import time.

Task 1: Query splitter `_split_query`

Files:

Modify: musicfetch (add _split_query just above lidarr_search)
Test: tests/test_lidarr_match.py
Step 1: Write the failing test

Create tests/test_lidarr_match.py:

import server.mf  # noqa: F401 — loads musicfetch, registers musicfetch_core in sys.modules
import musicfetch_core as mf


def test_split_query_with_dash():
    assert mf._split_query("Daft Punk - Discovery") == ("Daft Punk", "Discovery")


def test_split_query_no_dash():
    assert mf._split_query("Daft Punk") == ("Daft Punk", None)


def test_split_query_splits_on_first_dash_only():
    assert mf._split_query("A - B - C") == ("A", "B - C")


def test_split_query_strips_whitespace():
    assert mf._split_query("  Daft Punk  -  Discovery  ") == ("Daft Punk", "Discovery")

Step 2: Run test to verify it fails

Run: pytest tests/test_lidarr_match.py -v Expected: FAIL — AttributeError: module 'musicfetch_core' has no attribute '_split_query'

Step 3: Add the implementation

In musicfetch, immediately above def lidarr_search(:

def _split_query(query: str) -> tuple[str, Optional[str]]:
    """Split a Shazam-style 'Artist - Track' on the first ' - '.
    Returns (artist, track) or (term, None) when there is no separator."""
    if " - " in query:
        left, right = query.split(" - ", 1)
        return left.strip(), right.strip()
    return query.strip(), None

Step 4: Run test to verify it passes

Run: pytest tests/test_lidarr_match.py -v Expected: PASS (4 passed)

Step 5: Commit

git add musicfetch tests/test_lidarr_match.py
git commit -m "feat(lidarr): add Artist - Track query splitter"

Task 2: MusicBrainz track→album resolver

Files:

Modify: musicfetch (add import time to top imports; add MB constants + _mb_rate_limit, _mb_artist_credit, musicbrainz_best_album above lidarr_search)
Test: tests/test_musicbrainz.py

The release-group selection prefers studio albums (primary-type == "Album" with no secondary-types), choosing the earliest dated one, skipping Single/Compilation/Live. Verified live: for "Daft Punk / Harder Better Faster Stronger" MB returns a Single, Compilations, Live albums, and the studio Discovery (mbid 48117b90-a16e-34ca-a514-19c702df1158).

Step 1: Write the failing test

Create tests/test_musicbrainz.py:

import server.mf  # noqa: F401
import musicfetch_core as mf


class _FakeResp:
    def __init__(self, payload):
        self._payload = payload
    def raise_for_status(self):
        pass
    def json(self):
        return self._payload


# Trimmed real-shaped MB recording response.
MB_PAYLOAD = {
    "recordings": [
        {
            "artist-credit": [{"name": "Daft Punk"}],
            "releases": [
                {"date": "2001",
                 "release-group": {"id": "single-mbid", "title": "Harder, Better, Faster, Stronger",
                                   "primary-type": "Single", "secondary-types": []}},
                {"date": "2002",
                 "release-group": {"id": "comp-mbid", "title": "Musique, Vol. 1",
                                   "primary-type": "Album", "secondary-types": ["Compilation"]}},
                {"date": "2001",
                 "release-group": {"id": "48117b90-a16e-34ca-a514-19c702df1158",
                                   "title": "Discovery", "primary-type": "Album",
                                   "secondary-types": []}},
            ],
        }
    ]
}


def test_picks_studio_album_over_single_and_comp(monkeypatch):
    monkeypatch.setattr(mf.requests, "get", lambda *a, **k: _FakeResp(MB_PAYLOAD))
    monkeypatch.setattr(mf.time, "sleep", lambda *_: None)
    out = mf.musicbrainz_best_album("Daft Punk", "Harder Better Faster Stronger")
    assert out["album_title"] == "Discovery"
    assert out["artist"] == "Daft Punk"
    assert out["year"] == "2001"
    assert out["rg_mbid"] == "48117b90-a16e-34ca-a514-19c702df1158"


def test_returns_none_on_empty(monkeypatch):
    monkeypatch.setattr(mf.requests, "get", lambda *a, **k: _FakeResp({"recordings": []}))
    monkeypatch.setattr(mf.time, "sleep", lambda *_: None)
    assert mf.musicbrainz_best_album("Nobody", "Nothing") is None


def test_returns_none_on_exception(monkeypatch):
    def boom(*a, **k):
        raise mf.requests.exceptions.RequestException("network down")
    monkeypatch.setattr(mf.requests, "get", boom)
    monkeypatch.setattr(mf.time, "sleep", lambda *_: None)
    assert mf.musicbrainz_best_album("Daft Punk", "Discovery") is None


def test_falls_back_to_any_releasegroup_when_no_studio(monkeypatch):
    payload = {"recordings": [{"artist-credit": [{"name": "X"}], "releases": [
        {"date": "2010", "release-group": {"id": "live1", "title": "Live Thing",
                                           "primary-type": "Album", "secondary-types": ["Live"]}},
    ]}]}
    monkeypatch.setattr(mf.requests, "get", lambda *a, **k: _FakeResp(payload))
    monkeypatch.setattr(mf.time, "sleep", lambda *_: None)
    out = mf.musicbrainz_best_album("X", "Y")
    assert out["album_title"] == "Live Thing"


def test_first_artist_credit_only(monkeypatch):
    payload = {"recordings": [{"artist-credit": [{"name": "SLVMLORD"}, {"name": "Travis Bradley"}],
                              "releases": [{"date": "2025",
                                            "release-group": {"id": "x", "title": "Album X",
                                                              "primary-type": "Album",
                                                              "secondary-types": []}}]}]}
    monkeypatch.setattr(mf.requests, "get", lambda *a, **k: _FakeResp(payload))
    monkeypatch.setattr(mf.time, "sleep", lambda *_: None)
    out = mf.musicbrainz_best_album("SLVMLORD", "Under My Skin")
    assert out["artist"] == "SLVMLORD"

Step 2: Run test to verify it fails

Run: pytest tests/test_musicbrainz.py -v Expected: FAIL — AttributeError: ... 'musicbrainz_best_album'

Step 3: Add the implementation

Add import time to the top imports block of musicfetch (with import json, import os, etc.). Then add above lidarr_search:

MUSICBRAINZ_URL = "https://musicbrainz.org/ws/2"
MB_HEADERS = {"User-Agent": "musicfetch/2.0 (https://github.com/; personal music fetcher)"}
_mb_last_call = 0.0


def _mb_rate_limit():
    """Courtesy ~1 req/sec to MusicBrainz."""
    global _mb_last_call
    elapsed = time.time() - _mb_last_call
    if elapsed < 1.0:
        time.sleep(1.0 - elapsed)
    _mb_last_call = time.time()


def _mb_artist_credit(credit) -> str:
    """First credited artist name only (ignore featured/secondary)."""
    if credit and isinstance(credit, list) and isinstance(credit[0], dict):
        return credit[0].get("name") or (credit[0].get("artist") or {}).get("name", "")
    return ""


def musicbrainz_best_album(artist: str, track: str, timeout: int = 8) -> Optional[dict]:
    """Resolve 'artist - track' to its best studio album via MusicBrainz.
    Returns {album_title, artist, year, rg_mbid} or None. Never raises."""
    query = f'artist:"{artist}" AND recording:"{track}"'
    try:
        _mb_rate_limit()
        resp = requests.get(
            f"{MUSICBRAINZ_URL}/recording",
            params={"query": query, "fmt": "json", "limit": 10},
            headers=MB_HEADERS, timeout=timeout,
        )
        resp.raise_for_status()
        data = resp.json()
    except Exception as e:  # noqa: BLE001 — degrade to fallback on any failure
        dbg(f"MusicBrainz lookup failed: {e}")
        return None

    # candidate = (is_studio, date_sortkey, title, artist, year, mbid)
    candidates = []
    for rec in data.get("recordings", []):
        rec_artist = _mb_artist_credit(rec.get("artist-credit"))
        for rel in rec.get("releases", []):
            rg = rel.get("release-group") or {}
            title = rg.get("title") or rel.get("title") or ""
            if not title:
                continue
            mbid = rg.get("id") or ""
            primary = rg.get("primary-type") or ""
            secondary = rg.get("secondary-types") or []
            date = rel.get("date") or rg.get("first-release-date") or ""
            is_studio = primary == "Album" and not secondary
            candidates.append((is_studio, date or "9999", title, rec_artist, date[:4], mbid))

    if not candidates:
        return None
    pool = [c for c in candidates if c[0]] or candidates
    pool.sort(key=lambda c: c[1])  # earliest date first
    _, _, title, art, year, mbid = pool[0]
    dbg(f"MusicBrainz resolved '{artist} - {track}' -> '{title}' ({year}) mbid={mbid}")
    return {"album_title": title, "artist": art or artist, "year": year, "rg_mbid": mbid}

Step 4: Run test to verify it passes

Run: pytest tests/test_musicbrainz.py -v Expected: PASS (5 passed)

Step 5: Commit

git add musicfetch tests/test_musicbrainz.py
git commit -m "feat(lidarr): MusicBrainz track-to-album resolver"

Task 3: Rewrite `lidarr_search` for MBID-exact lookup

Files:

Modify: musicfetch (replace lidarr_search; add _lidarr_album_candidates, _lidarr_artist_candidates, _fallback_lookup, _universal_search)
Test: tests/test_lidarr_search.py
Step 1: Write the failing test

Create tests/test_lidarr_search.py:

import server.mf  # noqa: F401
import musicfetch_core as mf

DISCOVERY_MBID = "48117b90-a16e-34ca-a514-19c702df1158"

DISCOVERY_ALBUM = {"title": "Discovery", "artist": {"artistName": "Daft Punk"},
                   "releaseDate": "2001-01-01", "foreignAlbumId": DISCOVERY_MBID}


def test_artist_track_uses_mbid_exact_lookup(monkeypatch):
    monkeypatch.setattr(mf, "API_KEY", "testkey")
    monkeypatch.setattr(mf, "musicbrainz_best_album",
                        lambda artist, track: {"album_title": "Discovery", "artist": "Daft Punk",
                                               "year": "2001", "rg_mbid": DISCOVERY_MBID})
    seen = {}

    def fake_get(path, params=None, timeout=15):
        seen["term"] = (params or {}).get("term")
        if path == "/api/v1/album/lookup" and seen["term"] == f"mbid:{DISCOVERY_MBID}":
            return [DISCOVERY_ALBUM]
        return []
    monkeypatch.setattr(mf, "lidarr_get", fake_get)

    hits = mf.lidarr_search("Daft Punk - Harder Better Faster Stronger", 10)
    assert seen["term"] == f"mbid:{DISCOVERY_MBID}"   # exact MBID lookup, not fuzzy
    assert hits[0].album == "Discovery"
    assert hits[0].artist == "Daft Punk"
    assert hits[0].payload["album"]["foreignAlbumId"] == DISCOVERY_MBID


def test_year_enriched_from_musicbrainz(monkeypatch):
    monkeypatch.setattr(mf, "API_KEY", "testkey")
    monkeypatch.setattr(mf, "musicbrainz_best_album",
                        lambda artist, track: {"album_title": "Discovery", "artist": "Daft Punk",
                                               "year": "2001", "rg_mbid": DISCOVERY_MBID})
    no_year = [{"title": "Discovery", "artist": {"artistName": "Daft Punk"},
                "releaseDate": "", "foreignAlbumId": DISCOVERY_MBID}]
    monkeypatch.setattr(mf, "lidarr_get",
                        lambda path, params=None, timeout=15: no_year if path == "/api/v1/album/lookup" else [])
    hits = mf.lidarr_search("Daft Punk - Discovery", 10)
    assert hits[0].year == "2001"


def test_no_api_key_returns_empty(monkeypatch):
    monkeypatch.setattr(mf, "API_KEY", "")
    assert mf.lidarr_search("Daft Punk - Discovery", 10) == []


def test_mb_miss_falls_back_to_lookup(monkeypatch):
    monkeypatch.setattr(mf, "API_KEY", "testkey")
    monkeypatch.setattr(mf, "musicbrainz_best_album", lambda artist, track: None)
    monkeypatch.setattr(mf, "lidarr_get",
                        lambda path, params=None, timeout=15: [DISCOVERY_ALBUM] if path == "/api/v1/album/lookup" else [])
    hits = mf.lidarr_search("Daft Punk - Discovery", 10)
    assert hits[0].album == "Discovery"


def test_single_term_is_artist_first(monkeypatch):
    monkeypatch.setattr(mf, "API_KEY", "testkey")

    def fake_get(path, params=None, timeout=15):
        if path == "/api/v1/artist/lookup":
            return [{"artistName": "Daft Punk"}]
        if path == "/api/v1/album/lookup":
            return [DISCOVERY_ALBUM]
        return []
    monkeypatch.setattr(mf, "lidarr_get", fake_get)
    hits = mf.lidarr_search("Daft Punk", 10)
    assert hits[0].kind == "artist"           # bare term -> artist first
    assert hits[0].artist == "Daft Punk"


def test_last_resort_universal_search(monkeypatch):
    monkeypatch.setattr(mf, "API_KEY", "testkey")
    monkeypatch.setattr(mf, "musicbrainz_best_album", lambda artist, track: None)

    def fake_get(path, params=None, timeout=15):
        if path == "/api/v1/search":
            return [{"album": DISCOVERY_ALBUM}]
        return []  # album/lookup + artist/lookup empty
    monkeypatch.setattr(mf, "lidarr_get", fake_get)
    hits = mf.lidarr_search("Daft Punk - Discovery", 10)
    assert hits and hits[0].album == "Discovery"

Step 2: Run test to verify it fails

Run: pytest tests/test_lidarr_search.py -v Expected: FAIL (current lidarr_search ignores MB / mbid: lookup)

Step 3: Replace lidarr_search and add helpers

In musicfetch, replace the entire existing def lidarr_search(...) body (lines ~129-162) with the following, adding the helpers below it:

def lidarr_search(query: str, limit: int) -> list[Hit]:
    """Return Lidarr hits, best match first. Resolves 'Artist - Track' to an
    album's MusicBrainz release-group MBID, then does an exact Lidarr lookup
    (term=mbid:<id>) — no fuzzy ranking. Falls back so it never raises and
    returns [] only on total failure / missing key."""
    if not API_KEY:
        err("LIDARR_API_KEY not set — skipping Lidarr search.")
        return []

    artist, right = _split_query(query)

    if right:
        mb = musicbrainz_best_album(artist, right)
        if mb and mb["rg_mbid"]:
            hits = _lidarr_album_candidates(f"mbid:{mb['rg_mbid']}")
            for h in hits:
                if not h.year and mb["year"]:
                    h.year = mb["year"]
            if hits:
                return hits[:limit]
        # MusicBrainz miss / no exact album → plain lookup (album-first: a dash
        # query named an album/track).
        return _fallback_lookup(query, limit, artist_first=False)

    # Bare term is most often an artist.
    return _fallback_lookup(query, limit, artist_first=True)


def _lidarr_album_candidates(term: str) -> list[Hit]:
    try:
        return [_album_to_hit(a) for a in lidarr_get("/api/v1/album/lookup", params={"term": term})]
    except RequestException as e:
        dbg(f"album/lookup failed: {e}")
        return []


def _lidarr_artist_candidates(term: str) -> list[Hit]:
    try:
        return [_artist_to_hit(a) for a in lidarr_get("/api/v1/artist/lookup", params={"term": term})]
    except RequestException as e:
        dbg(f"artist/lookup failed: {e}")
        return []


def _fallback_lookup(query: str, limit: int, artist_first: bool) -> list[Hit]:
    """Plain album + artist lookups (no scoring); /search as last resort."""
    albums = _lidarr_album_candidates(query)
    artists = _lidarr_artist_candidates(query)
    hits = (artists + albums) if artist_first else (albums + artists)
    if hits:
        return hits[:limit]
    return _universal_search(query, limit)


def _universal_search(query: str, limit: int) -> list[Hit]:
    """Last resort: Lidarr's fuzzy /search (unranked)."""
    hits: list[Hit] = []
    try:
        for item in lidarr_get("/api/v1/search", params={"term": query}):
            if item.get("album"):
                hits.append(_album_to_hit(item["album"]))
            elif item.get("artist"):
                hits.append(_artist_to_hit(item["artist"]))
    except RequestException as e:
        dbg(f"/api/v1/search failed: {e}")
    return hits[:limit]

Step 4: Run tests to verify they pass

Run: pytest tests/test_lidarr_search.py -v Expected: PASS (6 passed)

Step 5: Run the full suite

Run: pytest -q Expected: all green (prior 27 + new split/musicbrainz/lidarr-search tests), and python3 -m py_compile musicfetch clean.

Step 6: Commit

git add musicfetch tests/test_lidarr_search.py
git commit -m "feat(lidarr): exact MBID album lookup via MusicBrainz resolution"

Task 4: Live verification against the user's Lidarr

Files: none (manual verification by the controller, not a subagent).

Step 1: Read-only check — lidarr_search resolves the real album

No mutation; confirms the MB → mbid: exact lookup end-to-end:

cd /home/zhering/Documents/musicfetch
env LIDARR_URL=http://10.2.1.16:8686 LIDARR_API_KEY=49cf02acb4c7436b842df2150056d468 \
  python3 -c "import server.mf, musicfetch_core as mf; \
  hits=mf.lidarr_search('Daft Punk - Harder Better Faster Stronger', 5); \
  print([(h.artist, h.album, h.payload['album'].get('foreignAlbumId')) for h in hits[:3]])"

Expected: first hit ('Daft Punk', 'Discovery', '48117b90-a16e-34ca-a514-19c702df1158').

Step 2: Spot-check a second track (different artist), e.g.:

env LIDARR_URL=http://10.2.1.16:8686 LIDARR_API_KEY=49cf02acb4c7436b842df2150056d468 \
  python3 -c "import server.mf, musicfetch_core as mf; \
  print([(h.artist,h.album) for h in mf.lidarr_search('Tame Impala - The Less I Know The Better',3)])"

Expected: top hit is the album containing that track (e.g. Currents), not a single/compilation.

Step 3: (Optional, mutating) full /fetch — only with user approval, since it adds the artist+album to their Lidarr. Start the API (env MUSICFETCH_API_KEY=… LIDARR_URL=http://10.2.1.16:8686 LIDARR_API_KEY=… MUSICFETCH_ROOT=/media/music python3 -m uvicorn server.app:app --port 6769), POST /fetch?q=...&source=lidarr, observe job + Lidarr UI, then clean up any added test artist via DELETE /api/v1/artist/<id>?deleteFiles=false.

Self-Review

Spec coverage:

Shared lidarr_search rewrite, same signature → Task 3. ✅
MusicBrainz resolver w/ studio release-group selection + first-artist credit → Task 2. ✅
mbid: exact Lidarr lookup (no fuzzy scoring) → Task 3. ✅
Query split → Task 1. ✅
Fallback tiers (MB miss → _fallback_lookup → /api/v1/search; returns [] on total failure / no key) → Task 3 (test_mb_miss_falls_back_to_lookup, test_last_resort_universal_search, test_no_api_key_returns_empty). ✅
Year enrichment from MB → Task 3 (test_year_enriched_from_musicbrainz). ✅
YouTube-fallback preserved (signature unchanged; [] on failure) → guaranteed + test_no_api_key_returns_empty. ✅
Single-term artist-first ordering → Task 3 (test_single_term_is_artist_first). ✅
Out-of-scope (difflib scoring removed; metadata/quality-profile hardening raised separately) intentionally excluded.

Placeholder scan: None — all code and test bodies complete; real MBID/JSON baked in.

Type consistency: lidarr_search(query, limit) -> list[Hit] unchanged. musicbrainz_best_album returns {album_title, artist, year, rg_mbid} — keys identical across Task 2 (definition) and Task 3 (consumes mb["rg_mbid"], mb["year"]) and tests. _split_query -> (str, Optional[str]) consistent. _lidarr_album_candidates/_lidarr_artist_candidates/_fallback_lookup(query, limit, artist_first)/_universal_search(query, limit) signatures consistent between Task 3 definition and call sites. _album_to_hit payload {"album": {...}} with foreignAlbumId matches the assertions in Task 3.

22 KiB Raw Blame History

Smarter Lidarr Matching Implementation Plan

Context for the implementer

Task 1: Query splitter _split_query

Task 2: MusicBrainz track→album resolver

Task 3: Rewrite lidarr_search for MBID-exact lookup

Task 4: Live verification against the user's Lidarr

Self-Review

22 KiB

Raw Blame History

Task 1: Query splitter `_split_query`

Task 3: Rewrite `lidarr_search` for MBID-exact lookup