# CSV Encoding Converter (JIS to UTF-8)

> **Status:** Active
> **Category:** Format Conversion
> **Language:** Python 3
> **Script files:** `convertJIStoUTF.py`, `Furikisei - convertJIStoUTF.py`

## Purpose

Two small scripts for converting Japanese CSV files (typically Shift-JIS or EUC-JP encoded) to UTF-8. Both use `chardet` to auto-detect the source encoding rather than assuming a fixed one.

## Requirements

### Dependencies

```bash
pip install chardet
```

## Scripts

There are two variants with different usage patterns:

### `convertJIStoUTF.py` — Batch / hardcoded mode

Reads `input.csv` from the current working directory and writes `output_utf8.csv`. Filenames are hardcoded in the script. Intended for repeated use when the source file always has the same name.

### `Furikisei - convertJIStoUTF.py` — Drag-and-drop / argument mode

Accepts the source file as a command-line argument (or by dragging a file onto the script in Windows Explorer). Produces a new file with `_utf8` appended to the stem, preserving the original.

A file named `DRAG FURIKISE TO PY TO AUTO CONVERT` is included in the folder as a visual reminder of this workflow.

## Input

| Item | Description | Example |
|------|-------------|---------|
| Source CSV (`convertJIStoUTF.py`) | Must be named `input.csv` in the current directory | `input.csv` |
| Source CSV (`Furikisei` variant) | Any CSV file, passed as a command-line argument | `myfile.csv` |

## Output

| Item | Description |
|------|-------------|
| `output_utf8.csv` (`convertJIStoUTF.py`) | UTF-8 re-encoded CSV in the current directory |
| `<stem>_utf8.csv` (`Furikisei` variant) | UTF-8 re-encoded CSV alongside the original, e.g. `myfile_utf8.csv` |

## Usage

```bash
# Batch mode — source must be named input.csv
python convertJIStoUTF.py

# Drag-and-drop / argument mode
python "Furikisei - convertJIStoUTF.py" myfile.csv
# → myfile_utf8.csv
```

## Examples

```bash
# Convert a Sony Bank Furikisei export
python "Furikisei - convertJIStoUTF.py" "Furikisei_202501.csv"
# Converted Furikisei_202501.csv (SHIFT_JIS) → Furikisei_202501_utf8.csv (UTF-8)

# Batch mode with renamed file
cp my_bank_export.csv input.csv
python convertJIStoUTF.py
# Converted input.csv (EUC-JP) → output_utf8.csv (UTF-8)
```

## Notes

- Both scripts use `errors="ignore"` when reading, meaning any bytes that cannot be decoded in the detected encoding are silently dropped. This is appropriate for occasional garbled characters but may cause data loss if the encoding detection is wrong.
- `chardet` detection is generally reliable for Shift-JIS and EUC-JP but can misidentify short or mostly-ASCII files. If output looks wrong, try the Sony Bank-specific converter ([actual-sonybank](actual-sonybank.md)) which hardcodes `cp932`.
- The `Furikisei` filename refers to the Japanese banking concept of "振替履歴" (transfer history), indicating this variant was originally created for that specific export type.

## Related Scripts

- [Sony Bank CSV Converter](actual-sonybank.md) — purpose-built converter for Sony Bank `FutsuRireki.csv` exports; uses hardcoded `cp932` and requires no dependencies
