Data Processing in Shell
Susan Sun
Data Person
curl
:
Check curl
installation:
man curl
If curl
has not been installed, you will see:
curl command not found.
For full instructions, see https://curl.haxx.se/download.html.
If curl
is installed, your console will look like this:
Press Enter
to scroll.
Press q
to exit.
Basic curl
syntax:
curl [option flags] [URL]
URL is required.
curl
also supports HTTP
, HTTPS
, FTP
, and SFTP
.
For a full list of the options available:
curl --help
Example:
A single file is stored at:
https://websitename.com/datafilename.txt
Use the optional flag -O
to save the file with its original name:
curl -O https://websitename.com/datafilename.txt
To rename the file, use the lower case -o
+ new file name:
curl -o renameddatafilename.txt https://websitename.com/datafilename.txt
Oftentimes, a server will host multiple data files, with similar filenames:
https://websitename.com/datafilename001.txt
https://websitename.com/datafilename002.txt
...
https://websitename.com/datafilename100.txt
Using Wildcards (*)
Download every file hosted on https://websitename.com/
that starts with datafilename
and ends in .txt
:
curl -O https://websitename.com/datafilename*.txt
Continuing with the previous example:
https://websitename.com/datafilename001.txt
https://websitename.com/datafilename002.txt
...
https://websitename.com/datafilename100.txt
Using Globbing Parser
The following will download every file sequentially starting with datafilename001.txt
and ending with datafilename100.txt
.
curl -O https://websitename.com/datafilename[001-100].txt
Continuing with the previous example:
https://websitename.com/datafilename001.txt
https://websitename.com/datafilename002.txt
...
https://websitename.com/datafilename100.txt
Using Globbing Parser
Increment through the files and download every Nth file (e.g.datafilename010.txt
, datafilename020.txt
, ... datafilename100.txt
)
curl -O https://websitename.com/datafilename[001-100:10].txt
curl
has two particularly useful option flags in case of timeouts during download:
-L
Redirects the HTTP URL if a 300 error code occurs.
-C
Resumes a previous file transfer if it times out before completion.
Putting everything together:
curl -L -O -C https://websitename.com/datafilename[001-100].txt
-L -C -O
is fine)Data Processing in Shell