In R XML Xpath, @href is returning the text "href" -


i trying contents of href using xpath code described in these two posts. unfortunately code returning actual text "href" , several spaces in addition url. how can avoid that?

library(xml)  html <- readlines("http://www.msu.edu") html.parse <- htmlparse(html) node <- getnodeset(html.parse, "//div[@id='msu-top-utilities']//a/@href") node[[1]]  # > node[[1]] #                  href  # "students/index.html"  # attr(,"class") # [1] "xmlattributevalue" 

it's named character vector. can do:

as.character(node[[1]]) 

which give

## [1] "students/index.html" 

alternately, here's better idiom in xml2 package:

library(xml2)  doc <- read_html("http://www.msu.edu") nodes <- xml_find_all(doc, "//div[@id='msu-top-utilities']//a") xml_attr(nodes, "href")  ## [1] "students/index.html"      "faculty-staff/index.html" "alumni/index.html"        ## [4] "businesses/index.html"    "visitors/index.html"    

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -