Manipulating data using SQL syntax

Data Processing in Shell

Susan Sun

Data Person

csvsql: documentation

csvsql:

  • applies SQL statements to one or more CSV files
  • creates an in-memory SQL database that temporarily hosts the file being processed
  • suitable for small to medium files only

Documentation:

csvsql -h
usage: csvsql [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
              [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-L LOCALE]
Data Processing in Shell

csvsql: applying SQL to a local CSV file

Sample syntax:

ls
Spotify_MusicAttributes.csv
Data Processing in Shell

csvsql: applying SQL to a local CSV file

Sample syntax:

csvsql --query "SELECT * FROM Spotify_MusicAttributes LIMIT 1" \
    Spotify_MusicAttributes.csv
track_id,danceability,duration_ms,instrumentalness,loudness,tempo,time_signature
118GQ70Sp6pMqn6w1oKuki,0.787,124016.0,0.784,-10.457,119.988,4.0
Data Processing in Shell

csvsql: applying SQL to a local CSV file

Sample syntax:

csvsql --query "SELECT * FROM Spotify_MusicAttributes LIMIT 1" \
    data/Spotify_MusicAttributes.csv | csvlook
| track_id               | danceability | duration_ms | instrumentalness ...
| ---------------------- | ------------ | ----------- | ---------------- ...
| 118GQ70Sp6pMqn6w1oKuki |        0.787 |     124,016 |            0.784 ...
Data Processing in Shell

csvsql: applying SQL to a local CSV file

Sample syntax:

csvsql --query "SELECT * FROM Spotify_MusicAttributes LIMIT 1" \
    data/Spotify_MusicAttributes.csv > OneSongFile.csv
ls
OneSongFile.csv
Data Processing in Shell

csvsql: joining CSVs using SQL syntax

Sample syntax:

csvsql --query "SELECT * FROM file_a INNER JOIN file_b..." file_a.csv file_b.csv

Note:

  • SQL Query must be written in one line, no breaks
  • Indicate CSV files in order of appearance in SQL
Data Processing in Shell

Let's practice!

Data Processing in Shell

Preparing Video For Download...