Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/450.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用R从Javascript检索文本(html节点)_Javascript_R_Xpath_Web Scraping - Fatal编程技术网

使用R从Javascript检索文本(html节点)

使用R从Javascript检索文本(html节点),javascript,r,xpath,web-scraping,Javascript,R,Xpath,Web Scraping,我试图从以下Javascript代码中检索“我很早就理解了……宇宙的精神”和作者的名字“Alice Walker”: <div id="qpos_4_3" class="m-brick grid-item boxy bqQt" style="position: absolute; left: 0px; top: 33815px;"> <div class=""> <a href="/quotes/quotes/a/alicewalke625815.html

我试图从以下Javascript代码中检索“我很早就理解了……宇宙的精神”和作者的名字“Alice Walker”:

<div id="qpos_4_3" class="m-brick grid-item boxy bqQt" style="position: absolute; left: 0px; top: 33815px;">

 <div class="">

  <a href="/quotes/quotes/a/alicewalke625815.html?src=t_age" class="b-qt 
  qt_625815 oncl_q" title="view quote">I understood at a very early age that 
  in nature, I felt everything I should feel in church but never did. 
  Walking in the woods, I felt in touch with the universe and with the 
  spirit of the universe.

  </a>

  <a href="/quotes/authors/a/alice_walker.html" class="bq-aut qa_625815 
  oncl_a" title="view author">Alice Walker</a>

  </div>

  <div class="kw-box">

   <a href="/quotes/topics/topic_nature.html" class="oncl_k" data-
   idx="0">Nature</a>,

  </div>

你的尝试就快成功了。请注意,您可以扩展XPath表达式,以包含您试图用
html\u attr
隔离的
标题,但您确实需要
xml\u内容。我添加的
magrittr
仅用于管道和可读性,其他方面不需要它。。。我已经将结果强制为字符,假设您将进一步使用它们

get_contents <- function(link, id, title) {

  require(xml2)
  require(magrittr)

  xpath <- paste0(".//div[@id='", id, "']//a[@title='", title, "']")

  read_html(link) %>%
    xml_find_first(xpath) %>%
    xml_contents() %>%
    as.character()

}

link <-  "https://www.brainyquote.com/quotes/topics/topic_age.html"
id <- "qpos_1_10"

quote <- get_contents(link, id, "view quote")

# [1] "In our age there is no such thing as 'keeping out of politics.' All
# issues are political issues, and politics itself is a mass of lies,
# evasions, folly, hatred and schizophrenia."

author <- get_contents(link, id, "view author")

# [1] "George Orwell"
获取内容%
xml_内容()%>%
as.character()
}

链接“您不得通过使用bot、spider、scraper、web爬虫、索引代理或其他自动设备或机制来访问、使用或复制网站的任何部分或其内容。您同意不删除或修改本网站内容中的任何版权声明或商标图例、作者归属或其他声明。除非我们以书面形式明确授权,否则在任何情况下,您都不得复制、重新分发、复制、复制、修改、分发……”最好阅读第5条。该条仅供个人使用和练习创建机器人,但感谢您指出这一点。我会非常小心。
get_contents <- function(link, id, title) {

  require(xml2)
  require(magrittr)

  xpath <- paste0(".//div[@id='", id, "']//a[@title='", title, "']")

  read_html(link) %>%
    xml_find_first(xpath) %>%
    xml_contents() %>%
    as.character()

}

link <-  "https://www.brainyquote.com/quotes/topics/topic_age.html"
id <- "qpos_1_10"

quote <- get_contents(link, id, "view quote")

# [1] "In our age there is no such thing as 'keeping out of politics.' All
# issues are political issues, and politics itself is a mass of lies,
# evasions, folly, hatred and schizophrenia."

author <- get_contents(link, id, "view author")

# [1] "George Orwell"