Scrape your first table

Web Scraping in R

Timo Grossenbacher

Instructor

Simple table

<table>
    <tr>
      <td>Name</td><td>Profession</td><td>Age</td><td>Country</td>
    </tr>
    <tr>
      <td>Dillon Arroyo</td><td>Carpenter</td><td>54</td><td>UK</td>
    </tr>
    <tr>
      <td>Rebecca Douglas</td><td>Developer</td><td>32</td><td>USA</td>
    </tr>
</table>
Web Scraping in R

Simple table

<table>
    <tr>
      <th>Name</th><th>Profession</th><th>Age</th><th>Country</th>
    </tr>
    <tr>
      <td>Dillon Arroyo</td><td>Carpenter</td><td>54</td><td>UK</td>
    </tr>
    <tr>
      <td>Rebecca Douglas</td><td>Developer</td><td>32</td><td>USA</td>
    </tr>
</table>
Web Scraping in R

Scraping a table with rvest

html <- read_html(table_html) # table with <th> header cells
html %>%
    html_table()
[[1]]
# A tibble: 2 × 4
  Name            Profession   Age Country
  <chr>           <chr>      <int> <chr>  
1 Dillon Arroyo   Carpenter     54 UK     
2 Rebecca Douglas Developer     32 USA
Web Scraping in R

Scraping a table with rvest

html <- read_html(table_html) # table without <th> header cells
html %>%
    html_table(header = TRUE)
[[1]]
# A tibble: 2 × 4
  Name            Profession   Age Country
  <chr>           <chr>      <int> <chr>  
1 Dillon Arroyo   Carpenter     54 UK     
2 Rebecca Douglas Developer     32 USA
Web Scraping in R

Scraping a table with rvest

html <- read_html(table_html)
html %>%
    html_table(header = TRUE)
[[1]]
             Name Profession Age Country
1   Dillon Arroyo  Carpenter  54      UK
2 Rebecca Douglas  Developer  32    <NA>
Web Scraping in R

Scraping "tables" in reality

<div class="rTable">
     <div class="rTableRow">
       <div class="rTableHead"><strong>Name</strong></div>
       <div class="rTableHead"><span style="font-weight: bold;">Telephone</span></div>
       <div class="rTableHead">&nbsp;</div>
     </div>
     <div class="rTableRow">
       <div class="rTableCell">John</div>
       <div class="rTableCell"><a href="tel:0123456785">0123 456 785</a></div>
       <div class="rTableCell"><img src="images/check.gif" alt="checked" /></div>
     </div>
     <div class="rTableRow">
         ...
     </div>
</div>
1 Example taken from https://html-cleaner.com/features/replace-html-table-tags-with-divs/
Web Scraping in R

Let's practice!

Web Scraping in R

Preparing Video For Download...