Downloading data using curl

Data Processing in Shell

Susan Sun

Data Person

What is curl?

curl:

  • is short for Client for URLs
  • is a Unix command line tool
  • transfers data to and from a server
  • is used to download data from HTTP(S) sites and FTP servers
Data Processing in Shell

Checking curl installation

Check curl installation:

man curl

If curl has not been installed, you will see:

curl command not found.

For full instructions, see https://curl.haxx.se/download.html.

Data Processing in Shell

Browsing the curl Manual

If curl is installed, your console will look like this:

Screenshot of the beginning of the curl manual as though printed in a dark Terminal window

Data Processing in Shell

Browsing the curl Manual

Press Enter to scroll.

Screenshot of a partially scrolled curl manual as though printed in a dark Terminal window

Press q to exit.

Data Processing in Shell

Learning curl Syntax

Basic curl syntax:

curl [option flags] [URL]

URL is required.

curl also supports HTTP, HTTPS, FTP, and SFTP.

For a full list of the options available:

curl --help
Data Processing in Shell

Downloading a Single File

Example:

A single file is stored at:

https://websitename.com/datafilename.txt

Use the optional flag -O to save the file with its original name:

curl -O https://websitename.com/datafilename.txt

To rename the file, use the lower case -o + new file name:

curl -o renameddatafilename.txt https://websitename.com/datafilename.txt
Data Processing in Shell

Downloading Multiple Files using Wildcards

Oftentimes, a server will host multiple data files, with similar filenames:

https://websitename.com/datafilename001.txt
https://websitename.com/datafilename002.txt
...
https://websitename.com/datafilename100.txt

Using Wildcards (*)

Download every file hosted on https://websitename.com/ that starts with datafilename and ends in .txt:

curl -O https://websitename.com/datafilename*.txt
Data Processing in Shell

Downloading Multiple Files using Globbing Parser

Continuing with the previous example:

https://websitename.com/datafilename001.txt
https://websitename.com/datafilename002.txt
...
https://websitename.com/datafilename100.txt

Using Globbing Parser

The following will download every file sequentially starting with datafilename001.txt and ending with datafilename100.txt.

curl -O https://websitename.com/datafilename[001-100].txt
Data Processing in Shell

Downloading Multiple Files using Globbing Parser

Continuing with the previous example:

https://websitename.com/datafilename001.txt
https://websitename.com/datafilename002.txt
...
https://websitename.com/datafilename100.txt

Using Globbing Parser

Increment through the files and download every Nth file (e.g.datafilename010.txt, datafilename020.txt, ... datafilename100.txt)

curl -O https://websitename.com/datafilename[001-100:10].txt
Data Processing in Shell

Preemptive Troubleshooting

curl has two particularly useful option flags in case of timeouts during download:

  • -L Redirects the HTTP URL if a 300 error code occurs.

  • -C Resumes a previous file transfer if it times out before completion.

Putting everything together:

curl -L -O -C https://websitename.com/datafilename[001-100].txt
  • All option flags come before the URL
  • Order of the flags does not matter (e.g. -L -C -O is fine)
Data Processing in Shell

Happy curl-ing!

Data Processing in Shell

Preparing Video For Download...