Web Scraping in R
Timo Grossenbacher
Instructor
...
<ol>
<li>First element.</li>
<li>Second element.</li>
<li>Third element.</li>
<li>Fourth element.</li>
<li>Fifth element.</li>
</ol>
...
html %>%
html_elements(xpath =
'//ol/li[position() = 2]')
# Equivalent CSS selector:
# ol > li:nth-child(2)
{xml_nodeset (1)}
[1] <li>Second element.</li>
...
<ol>
<li>First element.</li>
<li>Second element.</li>
<li>Third element.</li>
<li>Fourth element.</li>
<li>Fifth element.</li>
</ol>
...
html %>%
html_elements(xpath =
'//ol/li[position() < 3]')
{xml_nodeset (2)}
[1] <li>First element.</li>
[2] <li>Second element.</li>
...
<ol>
<li>First element.</li>
<li>Second element.</li>
<li>Third element.</li>
<li>Fourth element.</li>
<li>Fifth element.</li>
</ol>
...
html %>%
html_elements(xpath =
'//ol/li[position() != 3]')
{xml_nodeset (4)}
[1] <li>First element.</li>
[2] <li>Second element.</li>
[3] <li>Fourth element.</li>
[4] <li>Fifth element.</li>
...
<ol>
<li class = 'blue'>First element.</li>
<li>Second element.</li>
<li class = 'blue'>Third element.</li>
<li>Fourth element.</li>
<li class = 'blue'>Fifth element.</li>
</ol>
...
html %>%
html_elements(xpath =
'//ol/li[position() != 3 and @class = "blue"]')
{xml_nodeset (2)}
[1] <li class="blue">First element.</li>
[2] <li class="blue">Fifth element.</li>
html %>%
html_elements(xpath =
'//ol/li[position() != 3 or @class = "blue"]')
{xml_nodeset (5)}
...
...
<ol>
<li class = 'blue'>First element.</li>
<li>Second element.</li>
<li class = 'blue'>Third element.</li>
</ol>
<ol>
<li class = 'red'>First element.</li>
<li>Second element.</li>
</ol>
...
html %>%
html_elements(xpath = '//ol[count(li) = 2]')
{xml_nodeset (1)}
[1] <ol>\n<li class="red">...
html %>%
html_elements(xpath = '//ol[count(li) > 2]')
{xml_nodeset (1)}
[1] <ol>\n<li class="blue">...
Web Scraping in R