web scraping - Reading only the relevant text from an HTML page using R -


is there way access textual content on wikipedia using r. equivalent jsoup shown in post on stack extraction of text using: jsoup

thanks.

from here:

# load packages library(rcurl) library(xml)  # download html html <- geturl("https://en.wikipedia.org/wiki/main_page", followlocation = true)  # parse html doc = htmlparse(html, astext=true) plain.text <- xpathsapply(doc, "//p", xmlvalue) cat(paste(plain.text, collapse = "\n")) 

Comments

Popular posts from this blog

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

How to show in django cms breadcrumbs full path? -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -