Web Scraping di R
Timo Grossenbacher
Instructor
.alert {
color: red;
font-weight: 800;
}
...
<div>Beberapa teks.</div>
<div class = 'alert'>Teks penting.</div>
<div>
Beberapa teks dengan
<a href = '#' class = 'alert'>tautan penting</a>.
</div>
...

html %>% html_elements('.alert')
{xml_nodeset (2)}
[1] <div class="alert">Important text...
[2] <a href="#" class="alert">important ...
.alert {
color: red;
font-weight: 800;
}
.emph {
font-style: italic;
}
...
<div>Beberapa teks.</div>
<div class = 'alert emph'>Teks penting.</div>
<div>
Beberapa teks dengan
<a href = '#' class = 'alert'>tautan penting</a>.
</div>
...

html %>%
html_elements('.alert.emph') # bukan: .alert, .emph
{xml_nodeset (1)}
[1] <div class="alert emph">Important text...
#special {
color: green;
}
.alert {
color: red;
font-weight: 800;
}
...
<div id = 'special'>Beberapa teks.</div>
<div class = 'alert'>Teks penting.</div>
<div>
Beberapa teks dengan
<a href = '#' class = 'alert'>tautan penting</a>.
</div>
...

html %>%
html_elements('#special')
{xml_nodeset (1)}
[1] <div id="special">Some text.</div>
#special {
color: green;
}
.alert {
color: red;
font-weight: 800;
}
...
<div id = 'special'>Beberapa teks.</div>
<div class = 'alert'>Teks penting.</div>
<div>
Beberapa teks dengan
<a href = '#' class = 'alert'>tautan penting</a>.
</div>
...
html %>%
html_elements('a.alert')
{xml_nodeset (1)}
[1] <a href="#" class="alert">important ...
html %>%
html_elements('#special')
setara dengan...
html %>%
html_elements('div#special')
li:first-child { color: blue; }
li:nth-child(2) { color: green; }
li:last-child { color: red; }
...
<ol>
<li>Elemen pertama.</li>
<li>Elemen kedua.</li>
<li>Elemen ketiga.</li>
</ol>
...

html %>% html_elements('li:last-child')
# atau html_elements('li:nth-child(3)')
{xml_nodeset (1)}
[1] <li>Elemen ketiga.</li>
| Jenis selektor | HTML | Selektor CSS |
|---|---|---|
| Tipe | <p>...</p> |
p |
| Multi-tipe | <p>...</p><div>...</div> |
p, div |
| Kelas | <p class = 'x'>...</p> |
.x |
| Multi-kelas | <p class = 'x y'>...</p> |
.x.y |
| Tipe + Kelas | <p class = 'x'>...</p> |
p.x |
| ID | <p id = 'x'>...</p> |
#x |
| Tipe + Pseudo-class | <p>...</p><p>...</p> |
p:first-child |
Web Scraping di R