XPath Navigation

Web Scraping in Python

Thomas Laetsch

Data Scientist, NYU

Slashes and Brackets

  • Single forward slash / looks forward one generation
  • Double forward slash // looks forward all future generations
  • Square brackets [] help narrow in on specific elements
Web Scraping in Python

To Bracket or not to Bracket

xpath_body_sel.png

xpath = '/html/body'
xpath = '/html[1]/body[1]'
  • Give the same selection
Web Scraping in Python

A Body of P

xpath = '/html/body/p'

xpath_p_sel.png

Web Scraping in Python

The Birds and the Ps

xpath = '/html/body/div/p'

xpath_div_p_sel.png

xpath = '/html/body/div/p[2]'

xpath_div_p2_sel.png

Web Scraping in Python

Double Slashing the Brackets

xpath = '//p'

xpath_p_sel.png

xpath = '//p[1]'

xpath_body_ssp1_sel.png

Web Scraping in Python

The Wildcard

xpath = '/html/body/*'

xpath_body_ast_sel.png

  • The asterisks * is the "wildcard"
Web Scraping in Python

Xposé

Web Scraping in Python

Preparing Video For Download...