Web Scraping in Python
Thomas Laetsch
Data Scientist, NYU
Selector vs Response:
xpath
and css
methods followed by extract
and extract_first
methods.xpath
method works like a Selectorresponse.xpath( '//div/span[@class="bio"]' )
css
method works like a Selectorresponse.css( 'div > span.bio' )
response.xpath('//div').css('span.bio')
response.xpath('//div').css('span.bio').extract()
response.xpath('//div').css('span.bio').extract_first()
response
keeps track of the URL within the response url variable.response.url
>>> 'http://www.DataCamp.com/courses/all'
response
lets us "follow" a new link with the follow()
method# next_url is the string path of the next url we want to scrape
response.follow( next_url )
follow
later.Web Scraping in Python