# url-cleaner-v4

> **Status:** Active
> **Category:** URL Management
> **Language:** Python 3
> **Script file:** `url-cleaner-v4.py`

## Purpose

Recursively scans a directory for Windows `.url` internet shortcut files and rewrites each one to a minimal two-line format, upgrading `http://` to `https://` in the process. Displays a live progress counter and a final summary report with coloured output.

## Requirements

### Dependencies

Python standard library only — no `pip` installs required.

```
os, re, sys, collections, datetime  (all built-in)
```

Windows ANSI colour support is enabled automatically via `ctypes` on Windows. On Linux/macOS, ANSI codes work natively.

## Input

| Item | Description | Example |
|------|-------------|---------|
| Root directory | Path passed as a command-line argument; defaults to current working directory | `D:\Music\URL Shortcuts` |
| `.url` files | Windows internet shortcut files found recursively under the root | `Album info.url` |

## Output

| Item | Description |
|------|-------------|
| Cleaned `.url` files | Rewritten in place to the minimal two-line format |
| Console summary | Count of total files scanned, URL files found, cleaned, already-clean, and errors |

### Output `.url` format

```ini
[InternetShortcut]
URL=https://vgmdb.net/album/12345
```

### Summary report fields

| Field | Description |
|-------|-------------|
| Scanned | Total number of files visited (all types) |
| URL files | `.url` files found within scanned files |
| Cleaned | Files that were rewritten |
| Already OK | Files already in minimal format — left untouched |
| Errors | Files with no URL found, or read/write failures |

## Usage

```bash
# Scan a specific directory
python url-cleaner-v4.py "D:\Music\URL Shortcuts"

# Scan the current working directory
python url-cleaner-v4.py
```

## Examples

```bash
# Clean a full URL shortcuts folder
python url-cleaner-v4.py "C:\[Stage2]"

# Console output during scan:
#   Scanned: 1250
#   ✚  Cleaned    Album info.url
#   ✚  Cleaned    Album info.url
#   ...

# Final summary:
# ══════════════════════════════════════════════════════════════
#   SUMMARY
#   ──────────────────────────────────────────────────────────
#      Scanned      3421  total files
#      URL files    3200  found
#   ──────────────────────────────────────────────────────────
#   ✚  Cleaned       187
#   ✔  Already OK   3008
#   ✘  Errors          5
# ══════════════════════════════════════════════════════════════
```

## Notes

- The script waits for an `Enter` keypress before closing (`input("Press Enter to close… ")`), making it safe to run by double-clicking the `.py` file directly on Windows without the terminal disappearing.
- URL extraction has a fallback: if no `URL=` line is found, the script looks for a bare `http://` or `https://` URL anywhere in the file content.
- The live progress counter updates every 50 files (`UPDATE_INTERVAL = 50`) to avoid excessive terminal output.
- There is no dry-run mode — files are rewritten immediately. Preview the scope by checking the summary from a test run on a small subfolder first.
- Symlinks are not followed (`follow_symlinks=False` in `os.scandir`).
- `PermissionError` on a directory causes that subtree to be silently skipped.
- There is a typo variant of this script (`url-clearner-v4.py`) present in the `vgmdb-urlcleaner` folder — that is a copy, not a separate version.

## Related Scripts

- [vgmdb-urlcleaner](vgmdb-urlcleaner.md) — PowerShell equivalent with hardcoded folder paths; `URL CLEAN FULL CLEAN.ps1` performs the same clean + http-upgrade operation
- [vgmdb-compareurl](vgmdb-compareurl.md) — PowerShell http-to-https upgrader (`http to https.ps1`) with multi-drive support
- [vgmdb-url2list](vgmdb-url2list.md) — Extract URLs from cleaned `.url` files into a CSV
