Importing flat files from the web

Intermediate Importing Data in Python

Hugo Bowne-Anderson

Data Scientist at DataCamp

You’re already great at importing!

  • Flat files such as .txt and .csv

  • Pickled files, Excel spreadsheets, and many others!

  • Data from relational databases

  • You can do all these locally

  • What if your data is online?

Intermediate Importing Data in Python

Can you import web data?

ch_1_1.010.png

  • You can: go to URL and click to download files
  • BUT: not reproducible, not scalable
Intermediate Importing Data in Python

You’ll learn how to…

  • Import and locally save datasets from the web

  • Load datasets into pandas DataFrames

  • Make HTTP requests (GET requests)

  • Scrape web data such as HTML

  • Parse HTML into useful data (BeautifulSoup)

  • Use the urllib and requests packages

Intermediate Importing Data in Python

The urllib package

  • Provides interface for fetching data across the web
  • urlopen() - accepts URLs instead of file names
Intermediate Importing Data in Python

How to automate file download in Python

from urllib.request import urlretrieve
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/
winequality-white.csv'
urlretrieve(url, 'winequality-white.csv')
('winequality-white.csv', <http.client.HTTPMessage at 0x103cf1128>)
Intermediate Importing Data in Python

Let's practice!

Intermediate Importing Data in Python

Preparing Video For Download...