HTTP requests to import files from the web

Intermediate Importing Data in Python

Hugo Bowne-Anderson

Data Scientist at DataCamp

URL

  • Uniform/Universal Resource Locator
  • References to web resources
  • Focus: web addresses
  • Ingredients:
    • Protocol identifier - http:
    • Resource name - datacamp.com
  • These specify web addresses uniquely
Intermediate Importing Data in Python

HTTP

  • HyperText Transfer Protocol
  • Foundation of data communication for the web
  • HTTPS - more secure form of HTTP
  • Going to a website = sending HTTP request
    • GET request
  • urlretrieve() performs a GET request
  • HTML - HyperText Markup Language
Intermediate Importing Data in Python

GET requests using urllib

from urllib.request import urlopen, Request
url = "https://www.wikipedia.org/"
request = Request(url)
response = urlopen(request)
html = response.read()
response.close()
Intermediate Importing Data in Python

GET requests using requests

ch_1_2.026.png

  • Used by “her Majesty's Government, Amazon, Google, Twilio, NPR, Obama for America, Twitter, Sony, and Federal U.S. Institutions that prefer to be unnamed”
Intermediate Importing Data in Python

GET requests using requests

  • One of the most downloaded Python packages
import requests
url = "https://www.wikipedia.org/"
r = requests.get(url)
text = r.text
Intermediate Importing Data in Python

Let's practice!

Intermediate Importing Data in Python

Preparing Video For Download...