Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/84.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
HTML刮片-R刮片器_R_Web_Screen Scraping_Scraper - Fatal编程技术网

HTML刮片-R刮片器

HTML刮片-R刮片器,r,web,screen-scraping,scraper,R,Web,Screen Scraping,Scraper,我正在尝试解析以HTML格式编码的数据。我试图解析的字符串示例如下: Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" /> 我试过刮刀 y1 = scrape (str1) # the above string is in str1 (as a vector)

我正在尝试解析以HTML格式编码的数据。我试图解析的字符串示例如下:

Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />
我试过刮刀

y1 = scrape (str1)  # the above string is in str1 (as a vector)
我收到以下错误消息

Error in which(value == defs) : 
  argument "code" is missing, with no default
有人玩过刮刀吗。我不确定“代码”指的是什么,因为它是一个选项
手册中未对其进行说明。只是想看看哪个默认值会影响这一点。

这里有一种提取信息的方法

str1<-"Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />"

library(scrapeR)    
y<-scrape(object="str1")[[1]] #just get the first result

pretext <- sapply(xpathSApply(y, "//img/preceding::text()"), xmlValue)
alttext <- xpathSApply(y, "//img/@alt")

paste(pretext, alttext)
#[1] "Simplify the polynomial by combining like terms.  3x+12-11x+14"

str1根据文档,
scrape
函数通常将URL作为第一个未命名参数。那么
y1=scrape(object=“str1”)
呢?它接受str1。y1=刮取(str1)产生错误。y1=scrape(object=str1)产生另一种错误-无法定位对象str1。我认为object=xxx用于带有URL等的对象。它应该是
y=scrape(object=“str1”)
而不是
y1=scrape(object=str1)
。请参阅:y=scrape(object=“str1”)中的文档,它将整个HTML包装放在y中。现在它有了@MrFlick,谢谢你编辑了我的原始文章并正确地格式化了它。(我会学会的)非常感谢。它起作用了。现在我要弄清楚它为什么会起作用!(感谢你给我指出了正确的方向)找到了一个关于使用R的网页垃圾的好教程。只是为了将来的使用发布它。
str1<-"Simplify the polynomial by combining like terms. <img src=\"/flx/math/inline/3x%2B12-11x%2B14\" class=\"x-math\" alt=\"3x+12-11x+14\" />"

library(scrapeR)    
y<-scrape(object="str1")[[1]] #just get the first result

pretext <- sapply(xpathSApply(y, "//img/preceding::text()"), xmlValue)
alttext <- xpathSApply(y, "//img/@alt")

paste(pretext, alttext)
#[1] "Simplify the polynomial by combining like terms.  3x+12-11x+14"