---
name: spreadsheet-cleanup
description: Clean, normalize, and restructure messy tabular data with a full audit log of changes. Use this skill whenever a user has a spreadsheet or CSV that is inconsistent, duplicated, badly formatted, or hard to work with and wants it cleaned, deduplicated, standardized, or reshaped, or says 'this data is a mess', 'fix this spreadsheet', or 'standardize these columns'. Trigger whenever raw tabular data needs to become analysis-ready without silently losing information.
---

# Spreadsheet Cleanup and Normalization

## What this does and why it matters
Messy data quietly corrupts every decision made from it. This skill turns inconsistent, duplicated, badly formatted tabular data into a clean, analysis-ready version, with a clear log of every change so the result is trustworthy and auditable. The audit log matters as much as the cleaning, because silent changes to data destroy trust.

## Method

### 1. Profile before touching anything
Report row and column counts, detect the header row, and flag the issues: blank rows, merged cells, inconsistent types, duplicate keys, mixed date or number formats, stray characters. The user should see the state before any change.

### 2. Agree the rules for anything destructive
Confirm how to handle duplicates, blanks, and ambiguous values before removing or overwriting. Destructive changes made on assumption are how good data gets ruined.

### 3. Clean systematically
Trim whitespace, standardize casing and formats, split or merge columns as needed, normalize dates and numbers to one format, and deduplicate on the agreed key. Keep each transformation reversible in principle.

### 4. Validate
Re-profile and confirm the issues are resolved without dropping legitimate data. Spot-check that totals and counts still reconcile.

### 5. Log everything
Produce a change log so every transformation is auditable and nothing changed silently.

## Output format
Deliver the cleaned file plus:

## Cleaning report
- Starting shape: [rows x columns]
- Ending shape: [rows x columns]
- Changes applied (each transformation and how many cells or rows it touched)
- Rows removed and why
- Items needing human review (ambiguous values, possible duplicates)

## Anti-patterns to avoid
- Deleting data silently.
- Guessing on ambiguous values (a date that could be US or international, a possible duplicate) instead of flagging.
- Overwriting the original file instead of cleaning into a copy.
- No change log, so the result cannot be trusted.

## Example
A contact export with mixed casing, three date formats, and duplicate emails is normalized to one date format, title-cased names, and deduplicated on email, with 42 changes and 5 ambiguous rows flagged for review, all logged.
