Stacking data and chaining commands with csvkit

Data Processing in Shell

Susan Sun

Data Person

csvstack: stacking multiple CSV files

csvstack: stacks up the rows from two or more CSV files

Documentation:

csvstack -h
usage: csvstack [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
                [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H]
                [-n GROUP_NAME] [--filenames]
Data Processing in Shell

csvstack: stacking multiple CSV files

Stack two similar files Spotify_Rank6.csv and Spotify_Rank7.csv into one file.

Preview the data to check schema:

csvlook Spotify_Rank6.csv
| track_id               | popularity |
| ---------------------- | ---------- |
| 7JYCpIzpoidDOnnmxmHwtj |          6 |
| 0mmFibEg5NuULMwTVN2tRU |          6 |
Data Processing in Shell

csvstack: stacking multiple CSV files

csvlook Spotify_Rank7.csv
| track_id               | popularity |
| ---------------------- | ---------- |
| 118GQ70Sp6pMqn6w1oKuki |          7 |
| 6S7cr72a7a8RVAXzDCRj6m |          7 |
Data Processing in Shell

csvstack: stacking multiple CSV files

Syntax:

csvstack Spotify_Rank6.csv Spotify_Rank7.csv > Spotify_AllRanks.csv
csvlook Spotify_AllRanks.csv
| track_id               | popularity |
| ---------------------- | ---------- |
| 7JYCpIzpoidDOnnmxmHwtj |          6 |
| 0mmFibEg5NuULMwTVN2tRU |          6 |
| 118GQ70Sp6pMqn6w1oKuki |          7 |
| 6S7cr72a7a8RVAXzDCRj6m |          7 |
Data Processing in Shell

csvstack: stacking multiple CSV files

csvstack -g "Rank6","Rank7" \ 
Spotify_Rank6.csv Spotify_Rank7.csv > Spotify_AllRanks.csv
csvlook Spotify_AllRanks.csv
| group | track_id               | popularity |
| ----- | ---------------------- | ---------- |
| Rank6 | 7JYCpIzpoidDOnnmxmHwtj |          6 |
| Rank6 | 0mmFibEg5NuULMwTVN2tRU |          6 |
| Rank7 | 118GQ70Sp6pMqn6w1oKuki |          7 |
| Rank7 | 6S7cr72a7a8RVAXzDCRj6m |          7 |
Data Processing in Shell

csvstack: stacking multiple CSV files

csvstack -g "Rank6","Rank7" -n "source" \
Spotify_Rank6.csv Spotify_Rank7.csv > Spotify_AllRanks.csv
csvlook Spotify_AllRanks.csv
| source| track_id               | popularity |
| ----- | ---------------------- | ---------- |
| Rank6 | 7JYCpIzpoidDOnnmxmHwtj |          6 |
| Rank6 | 0mmFibEg5NuULMwTVN2tRU |          6 |
| Rank7 | 118GQ70Sp6pMqn6w1oKuki |          7 |
| Rank7 | 6S7cr72a7a8RVAXzDCRj6m |          7 |
Data Processing in Shell

chaining command-line commands

; links commands together and runs sequentially

csvlook SpotifyData_All.csv; csvstat SpotifyData_All.csv

&& links commands together, but only runs the 2nd command if the 1st succeeds

csvlook SpotifyData_All.csv && csvstat SpotifyData_All data.csv
Data Processing in Shell

chaining command-line commands

> re-directs the output from the 1st command to the location indicated as the 2nd

in2csv SpotifyData.xlsx > SpotifyData.csv
Data Processing in Shell

chaining command-line commands

| uses the output of the 1st command as input to the 2nd

Example:

Output of csvcut is not well formatted:

csvcut -c "track_id","danceability" Spotify_MusicAttributes.csv
track_id,danceability
118GQ70Sp6pMqn6w1oKuki,0.787
6S7cr72a7a8RVAXzDCRj6m,0.777
7h2qWpMJzIVtiP30E8VDW4,0.795
3KVQFxJ5CWOcbxdpPYdi4o,0.815
Data Processing in Shell

chaining command-line commands

Example (continued):

Re-format csvcut's output by piping the output as input to csvlook:

csvcut -c "track_id","danceability"  Spotify_Popularity.csv | csvlook
| track_id               | danceability |
| ---------------------- | ------------ |
| 118GQ70Sp6pMqn6w1oKuki |        0.787 |
| 6S7cr72a7a8RVAXzDCRj6m |        0.777 |
| 7h2qWpMJzIVtiP30E8VDW4 |        0.796 |
| 3KVQFxJ5CWOcbxdpPYdi4o |        0.815 |
Data Processing in Shell

Let's put everything together!

Data Processing in Shell

Preparing Video For Download...