HTML刮片-R刮片器
我正在尝试解析以HTML格式编码的数据。我试图解析的字符串示例如下:HTML刮片-R刮片器,r,web,screen-scraping,scraper,R,Web,Screen Scraping,Scraper,我正在尝试解析以HTML格式编码的数据。我试图解析的字符串示例如下: Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" /> 我试过刮刀 y1 = scrape (str1) # the above string is in str1 (as a vector)
Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />
我试过刮刀
y1 = scrape (str1) # the above string is in str1 (as a vector)
我收到以下错误消息
Error in which(value == defs) :
argument "code" is missing, with no default
有人玩过刮刀吗。我不确定“代码”指的是什么,因为它是一个选项
手册中未对其进行说明。只是想看看哪个默认值会影响这一点。这里有一种提取信息的方法
str1<-"Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />"
library(scrapeR)
y<-scrape(object="str1")[[1]] #just get the first result
pretext <- sapply(xpathSApply(y, "//img/preceding::text()"), xmlValue)
alttext <- xpathSApply(y, "//img/@alt")
paste(pretext, alttext)
#[1] "Simplify the polynomial by combining like terms. 3x+12-11x+14"
str1根据文档,scrape
函数通常将URL作为第一个未命名参数。那么y1=scrape(object=“str1”)
呢?它接受str1。y1=刮取(str1)产生错误。y1=scrape(object=str1)产生另一种错误-无法定位对象str1。我认为object=xxx用于带有URL等的对象。它应该是y=scrape(object=“str1”)
而不是y1=scrape(object=str1)
。请参阅:y=scrape(object=“str1”)中的文档,它将整个HTML包装放在y中。现在它有了@MrFlick,谢谢你编辑了我的原始文章并正确地格式化了它。(我会学会的)非常感谢。它起作用了。现在我要弄清楚它为什么会起作用!(感谢你给我指出了正确的方向)找到了一个关于使用R的网页垃圾的好教程。只是为了将来的使用发布它。
str1<-"Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />"
library(scrapeR)
y<-scrape(object="str1")[[1]] #just get the first result
pretext <- sapply(xpathSApply(y, "//img/preceding::text()"), xmlValue)
alttext <- xpathSApply(y, "//img/@alt")
paste(pretext, alttext)
#[1] "Simplify the polynomial by combining like terms. 3x+12-11x+14"