web scraping - Reading only the relevant text from an HTML page using R -
is there way access textual content on wikipedia using r. equivalent jsoup shown in post on stack extraction of text using: jsoup
thanks.
from here:
# load packages library(rcurl) library(xml) # download html html <- geturl("https://en.wikipedia.org/wiki/main_page", followlocation = true) # parse html doc = htmlparse(html, astext=true) plain.text <- xpathsapply(doc, "//p", xmlvalue) cat(paste(plain.text, collapse = "\n"))
Comments
Post a Comment