Fungsi XPATH dan predikat lanjutan

Web Scraping di R

Timo Grossenbacher

Instructor

Fungsi position()

...
<ol>
  <li>Elemen pertama.</li>
  <li>Elemen kedua.</li>
  <li>Elemen ketiga.</li>
  <li>Elemen keempat.</li>
  <li>Elemen kelima.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
             '//ol/li[position() = 2]')
# Equivalent CSS selector: 
# ol > li:nth-child(2)
{xml_nodeset (1)}
[1] <li>Elemen kedua.</li>
Web Scraping di R

Operator lain untuk fungsi position()

...
<ol>
  <li>Elemen pertama.</li>
  <li>Elemen kedua.</li>
  <li>Elemen ketiga.</li>
  <li>Elemen keempat.</li>
  <li>Elemen kelima.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
             '//ol/li[position() < 3]')
{xml_nodeset (2)}
[1] <li>Elemen pertama.</li>
[2] <li>Elemen kedua.</li>
Web Scraping di R

Operator lain untuk fungsi position()

...
<ol>
  <li>Elemen pertama.</li>
  <li>Elemen kedua.</li>
  <li>Elemen ketiga.</li>
  <li>Elemen keempat.</li>
  <li>Elemen kelima.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
             '//ol/li[position() != 3]')
{xml_nodeset (4)}
[1] <li>Elemen pertama.</li>
[2] <li>Elemen kedua.</li>
[3] <li>Elemen keempat.</li>
[4] <li>Elemen kelima.</li>
Web Scraping di R

Menggabungkan predikat

...
<ol>
  <li class = 'blue'>Elemen pertama.</li>
  <li>Elemen kedua.</li>
  <li class = 'blue'>Elemen ketiga.</li>
  <li>Elemen keempat.</li>
  <li class = 'blue'>Elemen kelima.</li>
</ol>
...
html %>% 
  html_elements(xpath = 
  '//ol/li[position() != 3 and @class = "blue"]')
{xml_nodeset (2)}
[1] <li class="blue">Elemen pertama.</li>
[2] <li class="blue">Elemen kelima.</li>
html %>% 
  html_elements(xpath = 
  '//ol/li[position() != 3 or @class = "blue"]')
{xml_nodeset (5)}
...
Web Scraping di R

Fungsi count()

...
<ol>
  <li class = 'blue'>Elemen pertama.</li>
  <li>Elemen kedua.</li>
  <li class = 'blue'>Elemen ketiga.</li>
</ol>
<ol>
  <li class = 'red'>Elemen pertama.</li>
  <li>Elemen kedua.</li>
</ol>
...
html %>% 
  html_elements(xpath = '//ol[count(li) = 2]')
{xml_nodeset (1)}
[1] <ol>\n<li class="red">...
html %>% 
  html_elements(xpath = '//ol[count(li) > 2]')
{xml_nodeset (1)}
[1] <ol>\n<li class="blue">...
Web Scraping di R

Mari coba beberapa fungsi!

Web Scraping di R

Preparing Video For Download...