Off the Beaten XPath

Web Scraping in Python

Thomas Laetsch

Data Scientist, NYU

(At)tribute

  • @ represents "attribute"
    • @class
    • @id
    • @href
Web Scraping in Python

Brackets and Attributes

xpathattr.png

Web Scraping in Python

Brackets and Attributes

xpathattr_div_p1.png

xpath = '//p[@class="class-1"]'
Web Scraping in Python

Brackets and Attributes

xpathattr_div.png

xpath = '//*[@id="uid"]'
Web Scraping in Python

Brackets and Attributes

xpathattr_div_astc2.png

xpath = '//div[@id="uid"]/p[2]'
Web Scraping in Python

Content with Contains

Xpath Contains Notation:

contains( @attri-name, "string-expr" )

Web Scraping in Python

Contain This

xpath = '//*[contains(@class,"class-1")]'

ClassSelection-Xpath-contains.png

Web Scraping in Python

Contain This

xpath = '//*[@class="class-1"]'

ClassSelection-Xpath-eq.png

Web Scraping in Python

Get Classy

xpathattr_div_astc2.png

xpath = '/html/body/div/p[2]'
Web Scraping in Python

Get Classy

xpathattr_div_p2-class.png

xpath = '/html/body/div/p[2]/@class'
Web Scraping in Python

End of the Path

Web Scraping in Python

Preparing Video For Download...