web scraping - Reading only the relevant text from an HTML page using R -

web scraping - Reading only the relevant text from an HTML page using R -

- September 15, 2014

is there way access textual content on wikipedia using r. equivalent jsoup shown in post on stack extraction of text using: jsoup

thanks.

from here:

# load packages library(rcurl) library(xml)  # download html html <- geturl("https://en.wikipedia.org/wiki/main_page", followlocation = true)  # parse html doc = htmlparse(html, astext=true) plain.text <- xpathsapply(doc, "//p", xmlvalue) cat(paste(plain.text, collapse = "\n"))

Comments