Web Scraping in R
Timo Grossenbacher
Instructor
<html>
<body>
<h2>A first example</h2>
<p>A text paragraph.</p>
<p>
Here follows a list:
</p>
</body>
</html>
...
<div>
Here follows a list:
<ul>
<li>Bullet 1</li>
<li>Bullet 2</li>
<li>Bullet 3</li>
</ul>
</div>
...
...
<p>
Here follows a
<a href="https://google.com">link</a>.
</p>
...
library(rvest)
html <- read_html(html_document)
html
{html_document}
<html>
[1] <body> \n <h2>A first example</h2>\n <p>A text paragraph.</p>\n ...
class(html)
"xml_document" "xml_node"
library(xml2)
xml_structure(html)
<html>
<body>
{text}
<h2>
{text}
{text}
<p>
{text}
{text}
<p>
{text}
<a [href]>
{text}
{text}
{text}
Web Scraping in R