# Docker VGMdb Scraper

> **Status:** Active
> **Category:** VGMdb Scrapers
> **Language:** Python 3
> **Script file:** `app.py`, `scraper.py`

## Purpose

A Flask web application that logs into vgmdb.net using your credentials, scrapes album collection data for any user, and presents it in a searchable, sortable table with Excel export and clipboard copy support. Designed to run as a persistent Docker container.

## Requirements

### Dependencies

All dependencies are pinned in `docker-requirements.txt`. The key packages are:

```
Flask==2.3.3
Flask-Session==0.8.0
requests==2.31.0
beautifulsoup4==4.12.2
pandas==2.1.1
openpyxl==3.1.2
gunicorn==23.0.0
```

### System Requirements

- [Docker](https://docs.docker.com/get-docker/)
- [Docker Compose](https://docs.docker.com/compose/install/)

## Input

| Item | Description | Example |
|------|-------------|---------|
| VGMdb username | Your vgmdb.net forum login username | `myusername` |
| VGMdb password | Your vgmdb.net forum login password | `mypassword` |
| User ID (optional) | Numeric VGMdb user ID to scrape; defaults to `42780` if blank | `7357` |

The user ID can be found in the URL of any VGMdb collection page: `https://vgmdb.net/db/collection.php?do=view&userid=XXXXX`

## Output

| Item | Description |
|------|-------------|
| Web table | Live sortable/searchable results table at `http://localhost:5266` |
| `vgmdb_collection.xlsx` | Excel file downloaded via the "Export to Excel" button |

### Columns in the output

| Column | Description |
|--------|-------------|
| Album Title | Title of the album |
| Ref ID | VGMdb internal reference ID for the album node |
| Folder ID | ID of the collection folder the album belongs to |
| Album ID | Numeric VGMdb album ID |
| URL | Relative path to the album page |

## Usage

### Docker (recommended)

1. Edit `docker-compose.yml` and set a secure session secret:

```yaml
environment:
  - SESSION_SECRET=your_secure_session_secret_here_change_this
```

2. Build and start the container:

```bash
docker-compose up -d
```

3. Open `http://localhost:5266` in your browser.

4. Enter your VGMdb credentials and optionally a target user ID, then click **Login & Scrape Collection**.

### Stopping

```bash
docker-compose down
```

### Running locally without Docker

```bash
pip install flask flask-session requests beautifulsoup4 pandas openpyxl
python app.py
```

The app will be available at `http://localhost:5000`.

## Examples

```bash
# Start the container in the background
docker-compose up -d

# View logs
docker-compose logs -f

# Rebuild after code changes
docker-compose build && docker-compose up -d

# Stop and remove the container
docker-compose down
```

## Notes

- The default user ID hardcoded in `app.py` is `42780`. If no user ID is entered in the web form, this value is used. Edit `app.py` line 41 to change the default.
- The Docker Compose file maps the container port 5000 to host port **5266**. Change the `ports` field in `docker-compose.yml` if that port is already in use.
- Session data is stored server-side in `.flask_session/` (mapped to `/home/alucard/docker-config/collectionscraper/flask` in the compose file). Credentials are never written to disk, only held in the session while the browser tab is open.
- The login mechanism targets the VGMdb forum login endpoint (`/forums/login.php`). If VGMdb changes its login flow, `scraper.py` will need updating.
- This application should only be deployed in trusted/local environments. Do not expose port 5266 to the public internet.

## Related Scripts

- [graburls](graburls.md) — alternative approach: extracts album URLs from a saved HTML file rather than performing a live login
- [extract-urls-final](extract-urls-final.md) — extracts VGMdb URLs from Windows `.url` shortcut files in a local music library, producing a URL list suitable for use with other scrapers
