# VGMDB Restructure Combined — Single-Script URL Extractor + Album Scraper

> **Status:** Active
> **Category:** File Organization
> **Language:** Python 3
> **Script file:** `vgmdb.restructure.combined.py`

## Purpose

An all-in-one version of the [vgmdb-restructure](vgmdb-restructure.md) two-step pipeline. Scans a directory tree for Windows `.url` shortcut files, records both the VGMdb URL and the local folder path for each, then scrapes album metadata from VGMdb — all in a single run with an interactive confirmation prompt between stages. The key addition over the two-step version is that the **local folder path** is captured and carried through to the final CSV.

## Requirements

### Dependencies

```bash
pip install requests beautifulsoup4 tqdm
```

## Input

| Item | Description | Example |
|------|-------------|---------|
| Working directory | Run from the directory tree containing `.url` shortcut files | `cd "D:\Music\FLAC"` |
| `.url` files | Windows internet shortcut files with a `URL=https://vgmdb.net/album/...` line | `Album info.url` |

## Output

| Item | Description |
|------|-------------|
| `urls_with_paths.csv` | Intermediate file: URL + local folder path, one row per album |
| `vgmdb_data_YYYY-MM-DD_HHMM.csv` | Final metadata CSV with one row per album |

### Final CSV columns

| Column | Description |
|--------|-------------|
| URL | VGMdb album page URL |
| Local Path | Local filesystem path of the folder containing the `.url` file (parent directory) |
| Album ID | Numeric ID extracted from the URL |
| Album Title | English title scraped from the page |
| Release Year | Four-digit year from the Release Date field |
| Catalog Number | Catalog number (truncated before "Other" if multiple are listed) |
| Publish Format | e.g. `Commercial`, `Doujin/Indie` |
| Category | e.g. `Game`, `Animation` |
| Classification | `OST` if the page says "Original Soundtrack"; otherwise the raw value |

## Usage

Run the script from the root of your music library (or wherever the `.url` files live):

```bash
cd "D:\Music\FLAC"
python vgmdb.restructure.combined.py
```

The script pauses between Stage 1 and Stage 2 and asks for confirmation before scraping:

```
Step 1 complete. Ready to run the second part of the script? (y/n):
```

Enter `y` to proceed or `n` to stop after Stage 1 (leaving `urls_with_paths.csv` for later inspection).

## Examples

```bash
# Example 1 — Full run (extract + scrape)
cd "D:\Music\FLAC"
python vgmdb.restructure.combined.py
# At the prompt, enter: y
# Output: urls_with_paths.csv  and  vgmdb_data_2026-05-01_1430.csv

# Example 2 — Extract only (stop before scraping)
python vgmdb.restructure.combined.py
# At the prompt, enter: n
# Output: urls_with_paths.csv only; inspect it before deciding to scrape
```

## Notes

- Both output files are written to the **current working directory** (no hardcoded paths, unlike the two-step version's Step 1).
- The `Local Path` recorded for each entry is the **parent directory** of the folder that contains the `.url` file, not the folder itself.
- "Original Soundtrack" in the Classification field is automatically abbreviated to `OST` in the final CSV.
- The 2-second delay between requests is intentional. Do not remove it for large batches.
- If a page cannot be fetched, the row is recorded with all metadata fields set to `Unknown`; the URL and Local Path are still preserved.
- Intermediate (`urls_with_paths.csv`) and final CSV files accumulate across runs. The intermediate file is overwritten each run; the final CSV gets a unique timestamp.

## Related Scripts

- [vgmdb-restructure](vgmdb-restructure.md) — the original two-step version (separate scripts; no local path recording)
- [vgmdb-url-rename-combo-python](vgmdb-url-rename-combo-python.md) — creates the `.url` shortcut files that this script reads
