Funzioni XPATH e predicati avanzati

Web scraping in R

Timo Grossenbacher

Instructor

La funzione position()

...
<ol>
  <li>First element.</li>
  <li>Second element.</li>
  <li>Third element.</li>
  <li>Fourth element.</li>
  <li>Fifth element.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
             '//ol/li[position() = 2]')
# Equivalent CSS selector: 
# ol > li:nth-child(2)
{xml_nodeset (1)}
[1] <li>Second element.</li>
Web scraping in R

Altri operatori per position()

...
<ol>
  <li>First element.</li>
  <li>Second element.</li>
  <li>Third element.</li>
  <li>Fourth element.</li>
  <li>Fifth element.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
             '//ol/li[position() < 3]')
{xml_nodeset (2)}
[1] <li>First element.</li>
[2] <li>Second element.</li>
Web scraping in R

Altri operatori per position()

...
<ol>
  <li>First element.</li>
  <li>Second element.</li>
  <li>Third element.</li>
  <li>Fourth element.</li>
  <li>Fifth element.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
             '//ol/li[position() != 3]')
{xml_nodeset (4)}
[1] <li>First element.</li>
[2] <li>Second element.</li>
[3] <li>Fourth element.</li>
[4] <li>Fifth element.</li>
Web scraping in R

Combinare i predicati

...
<ol>
  <li class = 'blue'>First element.</li>
  <li>Second element.</li>
  <li class = 'blue'>Third element.</li>
  <li>Fourth element.</li>
  <li class = 'blue'>Fifth element.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
  '//ol/li[position() != 3 and @class = "blue"]')
{xml_nodeset (2)}
[1] <li class="blue">First element.</li>
[2] <li class="blue">Fifth element.</li>
html %>% 
  html_elements(xpath = 
  '//ol/li[position() != 3 or @class = "blue"]')
{xml_nodeset (5)}
...
Web scraping in R

La funzione count()

...
<ol>
  <li class = 'blue'>First element.</li>
  <li>Second element.</li>
  <li class = 'blue'>Third element.</li>
</ol>
<ol>
  <li class = 'red'>First element.</li>
  <li>Second element.</li>
</ol>
...
html %>% 
  html_elements(xpath = '//ol[count(li) = 2]')
{xml_nodeset (1)}
[1] <ol>\n<li class="red">...
html %>% 
  html_elements(xpath = '//ol[count(li) > 2]')
{xml_nodeset (1)}
[1] <ol>\n<li class="blue">...
Web scraping in R

Proviamo alcune funzioni!

Web scraping in R

Preparing Video For Download...