# VGMPP Cookie Tagger

> **Status:** Active
> **Category:** VGMdb Scrapers
> **Language:** Python 3
> **Script file:** `commercial-v3touhou-history-incrementalupdates-persistance.py`, `doujin-v3touhou-history-incremendalupdates-persistance.py`

## Purpose

Scans a music library directory tree for VGMdb `.url` shortcut files, fetches album metadata from VGMdb for each one, writes FLAC tags directly to all `.flac` files in the same folder, and maintains a persistent JSON history so subsequent runs only process new albums. Two variants are included: one tuned for commercial releases, one for doujin releases (with arranger-priority artist logic).

## Requirements

### Dependencies

```bash
pip install requests beautifulsoup4 mutagen tqdm
```

| Package | Purpose |
|---------|---------|
| `requests` | HTTP requests to VGMdb |
| `beautifulsoup4` | HTML parsing of album pages |
| `mutagen` | Reading and writing FLAC tags |
| `tqdm` | Progress bar in the terminal |

### Authentication

Both scripts require an authenticated VGMdb session. Before running, edit the two constants near the top of each script:

```python
USER_AGENT = r"Mozilla/5.0 ..."   # your browser's User-Agent string
COOKIE_STRING = r"..."            # your vgmdb.net session cookie string
```

Copy these values from your browser's DevTools (Network tab, any request to vgmdb.net, copy the `User-Agent` and `Cookie` request headers).

## Scripts

| Script | Target Library | Artist Logic |
|--------|---------------|-------------|
| `commercial-v3touhou-history-incrementalupdates-persistance.py` | Commercial releases | First credit row artist |
| `doujin-v3touhou-history-incremendalupdates-persistance.py` | Doujin releases | Arranger-priority; falls back to `ZZZZZZ - Needs Attention` |

### Doujin Artist Logic

The doujin script applies different artist-selection logic based on the album's Classification field:

| Classification | Artist source |
|---------------|--------------|
| Original Work | First credit row |
| Arrangement | Arranger / Arranged by / All Arrangement (checked in preference order) |
| Original Soundtrack / OST | Composer row |
| Not found | Set to `ZZZZZZ - Needs Attention` (sorts to end of list for manual review) |

## Input

| Item | Description | Example |
|------|-------------|---------|
| `*.url` files | Windows Internet Shortcut files inside album subdirectories; one per album folder | `Chrono Trigger OST.url` |
| `*.flac` files | FLAC audio files in the same directory as the `.url` file | `01 - Opening Theme.flac` |

The scripts are run from the root of your music library. They walk the entire directory tree looking for folders that contain a `.url` file, then process each such folder.

## Output

| Item | Description |
|------|-------------|
| FLAC tags (modified in place) | `VGMDB_URL`, `ARTIST`, and `ALBUMARTIST` tags written to all `.flac` files in the album directory |
| `vgmdb_data_YYYY-MM-DD_HHMM.csv` | Summary CSV of all albums processed this run (commercial variant) |
| `vgmdb_doujin_data_YYYY-MM-DD_HHMM.csv` | Summary CSV of all albums processed this run (doujin variant) |
| `vgmdb_history_urls_commercial.json` | Persistent set of VGMdb URLs already processed; prevents reprocessing on future runs |
| `vgmdb_merged.log` | Append-only log of tagging activity and errors |
| `vgmdb_merged_undo.json` | Snapshot of previous tag values for every file modified this run, enabling undo |

### Tags Written

| FLAC Tag | Source |
|----------|--------|
| `VGMDB_URL` | VGMdb album page URL (from the `.url` file) |
| `ARTIST` | Composer / Arranger extracted from VGMdb credits section |
| `ALBUMARTIST` | Same value as `ARTIST` |

### CSV Columns

| Column | Description |
|--------|-------------|
| URL | VGMdb album URL |
| Album ID | Numeric VGMdb album ID |
| Album Title | English title from VGMdb |
| Release Year | Four-digit year extracted from Release Date |
| Catalog Number | Catalog number as listed on VGMdb |
| Publish Format | e.g., Commercial, Doujin |
| Category | e.g., Game |
| Classification | e.g., Original Soundtrack, Arrangement |
| Local Path | Parent directory of the album folder on disk |
| Touhou? | "Yes" if "touhou" appears in the page's meta keywords; blank otherwise |

## Usage

Run the script from the root of your music library, or from the script directory (the script changes its working directory to its own location on startup):

```bash
# Commercial releases
python commercial-v3touhou-history-incrementalupdates-persistance.py

# Doujin releases
python doujin-v3touhou-history-incremendalupdates-persistance.py
```

Both scripts present a menu on startup:

```
1 = Scan, Scrape & Tag
2 = Undo Last Tagging
```

Enter `1` to run a full scan and tag pass. Enter `2` to restore all FLAC files modified during the most recent run to their previous tag values.

### Options / Arguments

These scripts take no command-line arguments. All configuration is done by editing constants at the top of each file.

## Examples

```bash
# Run commercial tagger from your library root
cd "D:\Music\VGM\Commercial"
python commercial-v3touhou-history-incrementalupdates-persistance.py
# Enter: 1

# Undo the last tagging run if something went wrong
python commercial-v3touhou-history-incrementalupdates-persistance.py
# Enter: 2

# Run doujin tagger
cd "D:\Music\VGM\Doujin"
python doujin-v3touhou-history-incremendalupdates-persistance.py
# Enter: 1
```

## Notes

- **Hardcoded credentials:** `USER_AGENT` and `COOKIE_STRING` must be replaced with your own values before running. The example values present in both files are real but will expire. The comment in the source reads: `PASTE YOUR REAL CHROME VALUES HERE`.
- **Hardcoded log directory:** Both scripts write logs, history, and undo files to a hardcoded path. The commercial script uses `P:\VGMPP-LOGS\Commercial`; the doujin script uses `P:\VGMPP-LOGS\Doujin`. On non-Windows systems, the `P:` drive letter will not resolve. Change `BASE_LOG_DIR` near the top of each script before running on Linux/Mac.
- **Rate limit:** 4-second delay between requests (chosen to avoid Cloudflare blocks). Reduce only with caution.
- **Retry logic:** Each URL is attempted up to 3 times before being recorded as a failure.
- **History file:** The commercial history file is named `vgmdb_history_urls_commercial.json` in both scripts — the doujin script reuses the same filename (`vgmdb_history_urls_commercial.json`), not a separate one. This means the two scripts share the same history file. If you want them to maintain separate histories, rename the constant `HISTORY_FILE` in the doujin script.
- **Undo scope:** `vgmdb_merged_undo.json` only covers the most recently completed run. It is overwritten each run. If you need to undo a run after doing another run, you must restore from a backup.
- **Working directory:** On startup, both scripts `os.chdir` to their own script directory. Running them from a different directory with a relative path may cause unexpected behavior; use absolute paths or `cd` to the music library root and run from there.
- **FLAC only:** Only `.flac` files are tagged. Other formats (MP3, ALAC, FLAC-in-OPUS, etc.) are ignored.

## Related Scripts

- [extract-urls-final](extract-urls-final.md) — scans a directory tree for `.url` files and extracts the URLs to a text file; useful for auditing which albums have shortcut files before tagging
- [Reprint Finder v6](reprintfinder-v6.md) — also uses cookie-based VGMdb authentication with the same `USER_AGENT` / `COOKIE` pattern
