# VGMdb Redbook Check

> **Status:** Active
> **Category:** Data Extraction
> **Language:** Python 3
> **Script file:** `redbook2.py`

## Purpose

Crawls a list of VGMdb search URLs and checks each one for two conditions: whether the search returns any results at all, and if so, whether the album's keywords metadata includes "touhou". Results are saved to a CSV for review.

## Requirements

### Dependencies

```bash
pip install requests beautifulsoup4
```

| Package | Purpose |
|---------|---------|
| `requests` | HTTP fetching with session reuse and redirect following |
| `beautifulsoup4` | HTML parsing to read `<meta name="keywords">` |

## Input

| Item | Description | Example |
|------|-------------|---------|
| `links.txt` | One VGMdb search URL per line | `https://vgmdb.net/search?q=ANCF-0036` |

The script reads `links.txt` from the working directory. URLs should be VGMdb catalogue number search queries in the form `https://vgmdb.net/search?q=CATALOG-NUMBER`.

## Output

| Item | Description |
|------|-------------|
| `crawler_results.csv` | CSV with columns: `Original URL`, `Final URL`, `Result` |
| Console (copy/paste block) | Same data printed as plain comma-separated lines |

### Result Values

| Value | Meaning |
|-------|---------|
| `Touhou Found` | Page loaded, keywords meta tag present, and contains "touhou" |
| `Touhou Not Found` | Page loaded, keywords meta tag present, but does not contain "touhou" |
| `Not in VGMDB` | Response body contains the string "0 album results for" |
| `Keywords meta tag missing` | Page loaded but no `<meta name="keywords">` tag found |
| `Error fetching URL` | Request failed (timeout, network error, HTTP error) |

## Usage

```bash
python redbook2.py
```

The script uses a `requests.Session` for connection reuse. Progress is printed to the console as each URL is processed, with a summary line at the end.

## Examples

```bash
# Prepare links.txt with catalogue number search URLs, then run:
python redbook2.py

# Console output during run:
# Loaded 967 URLs from links.txt
# Checking: https://vgmdb.net/search?q=ANCF-0036
# Checking: https://vgmdb.net/search?q=ANCF-0037
# ...
# Crawl complete!
# Results have been saved to 'crawler_results.csv'

# Example rows in crawler_results.csv:
# https://vgmdb.net/search?q=ANCF-0036,https://vgmdb.net/album/12345,Touhou Not Found
# https://vgmdb.net/search?q=FAKE-0001,https://vgmdb.net/search?q=FAKE-0001,Not in VGMDB
```

## Notes

- The request timeout is hardcoded at **15 seconds** per URL. Large `links.txt` files will take a proportionally long time — the included `links.txt` has 967 URLs.
- The User-Agent sent with each request is hardcoded as `MyKeywordCrawler/1.0`.
- The script follows HTTP redirects automatically; the `Final URL` column captures the landing URL after any redirect.
- The "touhou" check is case-insensitive (the keywords content is lowercased before comparison).
- The output CSV is written in UTF-8. The console copy/paste block is a plain comma-separated echo with no quoting — values containing commas would break that display, though the CSV file itself is properly formatted.
- The included `links.txt` and `crawler_results.csv` in the repository are from a previous run and can be overwritten safely.

## Note on README Mismatch

The `README.md` in this directory describes a FLAC folder organiser that reads `VGMDB_PUBLISHER` tags — that is a different, unrelated script. The actual script present (`redbook2.py`) is the URL crawler documented here.

## Related Scripts

- [VGMdb HTML Data Extractors](extract.md) — generates VGMdb URLs from saved HTML; output can be adapted as input for this script
- [VGMdb Nicotine Shares](vgmdb-nicotineshares.md) — generates catalogue numbers that can be formatted into search URLs for this script
