HTML刮片-R刮片器_R_Web_Screen Scraping_Scraper

HTML刮片-R刮片器

r web

HTML刮片-R刮片器,r,web,screen-scraping,scraper,R,Web,Screen Scraping,Scraper,我正在尝试解析以HTML格式编码的数据。我试图解析的字符串示例如下： Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" /> 我试过刮刀 y1 = scrape (str1) # the above string is in str1 (as a vector)

我正在尝试解析以HTML格式编码的数据。我试图解析的字符串示例如下：

Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />

我试过刮刀

y1 = scrape (str1)  # the above string is in str1 (as a vector)

我收到以下错误消息

Error in which(value == defs) : 
  argument "code" is missing, with no default

有人玩过刮刀吗。我不确定“代码”指的是什么，因为它是一个选项

手册中未对其进行说明。只是想看看哪个默认值会影响这一点。

这里有一种提取信息的方法

str1<-"Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />"

library(scrapeR)    
y<-scrape(object="str1")[[1]] #just get the first result

pretext <- sapply(xpathSApply(y, "//img/preceding::text()"), xmlValue)
alttext <- xpathSApply(y, "//img/@alt")

paste(pretext, alttext)
#[1] "Simplify the polynomial by combining like terms.  3x+12-11x+14"

str1根据文档，scrape
函数通常将URL作为第一个未命名参数。那么y1=scrape（object=“str1”）
呢？它接受str1。y1=刮取（str1）产生错误。y1=scrape（object=str1）产生另一种错误-无法定位对象str1。我认为object=xxx用于带有URL等的对象。它应该是y=scrape（object=“str1”）
而不是y1=scrape（object=str1）。请参阅：y=scrape（object=“str1”）中的文档，它将整个HTML包装放在y中。现在它有了@MrFlick，谢谢你编辑了我的原始文章并正确地格式化了它。（我会学会的）非常感谢。它起作用了。现在我要弄清楚它为什么会起作用！（感谢你给我指出了正确的方向）找到了一个关于使用R的网页垃圾的好教程。只是为了将来的使用发布它。
str1<-"Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />"

library(scrapeR)    
y<-scrape(object="str1")[[1]] #just get the first result

pretext <- sapply(xpathSApply(y, "//img/preceding::text()"), xmlValue)
alttext <- xpathSApply(y, "//img/@alt")

paste(pretext, alttext)
#[1] "Simplify the polynomial by combining like terms.  3x+12-11x+14"