Getting Ready to Crawl

Web Scraping in Python

Thomas Laetsch

Data Scientist, NYU

Let's Respond

Selector vs Response:

The Response has all the tools we learned with Selectors:
- xpath and css methods followed by extract and extract_first methods.
The Response also keeps track of the url where the HTML code was loaded from.
The Response helps us move from one site to another, so that we can "crawl" the web while scraping.

response.xpath( '//div/span[@class="bio"]' )

response.css( 'div > span.bio' )

response.xpath('//div').css('span.bio')

response.xpath('//div').css('span.bio').extract()
response.xpath('//div').css('span.bio').extract_first()

response.url
>>> 'http://www.DataCamp.com/courses/all'

# next_url is the string path of the next url we want to scrape
response.follow( next_url )

Web Scraping in Python