CSVCLEAN(1) | csvkit | CSVCLEAN(1) |
NAME
csvclean - csvclean Documentation
DESCRIPTION
Reports and fixes common errors in a CSV file.
Checks
- Reports rows that have a different number of columns than the header row, if the --length-mismatch option is set.
- Reports columns that are empty, if the --empty-columns option is set.
TIP:
Fixes
- •
- If a CSV has unquoted cells that contain line breaks, like:
id,address,country 1,1 Main St Springfield,US 2,123 Acadia Avenue London,GB
Use --join-short-rows to attempt to correct the errors by merging short rows into a single row:
id,address,country 1,"1 Main St Springfield",US 2,"123 Acadia Avenue London",GB
To change the string used to join the lines, use --separator. For example, with --separator ", ":
id,address,country 1,"1 Main St, Springfield",US 2,"123 Acadia Avenue, London",GB
- •
- If a CSV has missing delimiters, like:
id,name,country 1,Alice 2,Bob,CA
You can add the missing delimiters with --fill-short-rows:
id,name,country 1,Alice, 2,Bob,CA
TIP:
To change the value used to fill short rows, use --fillvalue. For example, with --fillvalue "US":
id,name,country 1,Alice,US 2,Bob,CA
SEE ALSO:
NOTE:
- Removes optional quote characters, unless the --quoting (-u) option is set to change this behavior
- Changes the field delimiter to a comma, if the input delimiter is set with the --delimiter (-d) or --tabs (-t) options
- Changes the record delimiter to a line feed (LF or \n)
- Changes the quote character to a double-quotation mark, if the character is set with the --quotechar (-q) option
- Changes the character encoding to UTF-8, if the input encoding is set with the --encoding (-e) option
Output
csvclean attempts to make the selected fixes. Then:
- If the --omit-error-rows option is set, only rows that pass the selected checks are written to standard output. If not, all rows are written to standard output.
- If any checks are enabled, error rows along with line numbers and descriptions are written to standard error. If there are error rows, the exit code is 1.
Usage
usage: csvclean [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b] [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [FILE] Fix common errors in a CSV file. positional arguments: FILE The CSV file to operate on. If omitted, will accept input as piped data via STDIN. optional arguments: -h, --help show this help message and exit --length-mismatch Report data rows that are shorter or longer than the header row. --empty-columns Report empty columns as errors. -a, --enable-all-checks Enable all error reporting. --omit-error-rows Omit data rows that contain errors, from standard output. --label LABEL Add a "label" column to standard error. Useful in automated workflows. --header-normalize-space Strip leading and trailing whitespace and replace sequences of whitespace characters by a single space in the header. --join-short-rows Merges short rows into a single row. --separator SEPARATOR The string with which to join short rows. Defaults to a newline. --fill-short-rows Fill short rows with the missing cells. --fillvalue FILLVALUE The value with which to fill short rows. Defaults to none.
See also: Arguments common to all tools.
EXAMPLES
Test a file with data rows that are shorter and longer than the header row:
$ csvclean examples/bad.csv 2> errors.csv column_a,column_b,column_c 0,mixed types.... uh oh,17 $ cat errors.csv line_number,msg,column_a,column_b,column_c 1,"Expected 3 columns, found 4 columns",1,27,,I'm too long! 2,"Expected 3 columns, found 2 columns",,I'm too short!
NOTE:
Test a file with empty columns:
$ csvclean --empty-columns examples/test_empty_columns.csv 2> errors.csv a,b,c,, a,,,, ,,c,, ,,,, $ cat errors.csv line_number,msg,a,b,c,, 1,"Empty columns named 'b', '', ''! Try: csvcut -C 2,4,5",,,,,
Use csvcut to exclude the empty columns:
$ csvcut -C 2,4,5 examples/test_empty_columns.csv a,c a, ,c ,
To change the line ending from line feed (LF or \n) to carriage return and line feed (CRLF or \r\n) use:
csvformat -M $'\r\n' examples/dummy.csv
AUTHOR
Christopher Groskopf and contributors
COPYRIGHT
2016, Christopher Groskopf and James McKinney
July 12, 2024 | 2.0.1 |