Web Scraping in R
Timo Grossenbacher
Instructor
<table>
<tr>
<td>Name</td><td>Profession</td><td>Age</td><td>Country</td>
</tr>
<tr>
<td>Dillon Arroyo</td><td>Carpenter</td><td>54</td><td>UK</td>
</tr>
<tr>
<td>Rebecca Douglas</td><td>Developer</td><td>32</td><td>USA</td>
</tr>
</table>
<table>
<tr>
<th>Name</th><th>Profession</th><th>Age</th><th>Country</th>
</tr>
<tr>
<td>Dillon Arroyo</td><td>Carpenter</td><td>54</td><td>UK</td>
</tr>
<tr>
<td>Rebecca Douglas</td><td>Developer</td><td>32</td><td>USA</td>
</tr>
</table>
html <- read_html(table_html) # table with <th> header cells
html %>%
html_table()
[[1]]
# A tibble: 2 × 4
Name Profession Age Country
<chr> <chr> <int> <chr>
1 Dillon Arroyo Carpenter 54 UK
2 Rebecca Douglas Developer 32 USA
html <- read_html(table_html) # table without <th> header cells
html %>%
html_table(header = TRUE)
[[1]]
# A tibble: 2 × 4
Name Profession Age Country
<chr> <chr> <int> <chr>
1 Dillon Arroyo Carpenter 54 UK
2 Rebecca Douglas Developer 32 USA
html <- read_html(table_html)
html %>%
html_table(header = TRUE)
[[1]]
Name Profession Age Country
1 Dillon Arroyo Carpenter 54 UK
2 Rebecca Douglas Developer 32 <NA>
<div class="rTable">
<div class="rTableRow">
<div class="rTableHead"><strong>Name</strong></div>
<div class="rTableHead"><span style="font-weight: bold;">Telephone</span></div>
<div class="rTableHead"> </div>
</div>
<div class="rTableRow">
<div class="rTableCell">John</div>
<div class="rTableCell"><a href="tel:0123456785">0123 456 785</a></div>
<div class="rTableCell"><img src="images/check.gif" alt="checked" /></div>
</div>
<div class="rTableRow">
...
</div>
</div>
Web Scraping in R