Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
curl::curl\u fetch\u内存错误(url,handle=handle):发送失败:连接被重置(RStudio.cloud)_R_Web Scraping_Rcurl_Httr - Fatal编程技术网

curl::curl\u fetch\u内存错误(url,handle=handle):发送失败:连接被重置(RStudio.cloud)

curl::curl\u fetch\u内存错误(url,handle=handle):发送失败:连接被重置(RStudio.cloud),r,web-scraping,rcurl,httr,R,Web Scraping,Rcurl,Httr,我想从此网页获取id_产品和id_父项。 昨天,我可以得到结果,但当我今天再次尝试时,我收到一条错误消息。无论如何,我是在rstudio.cloud上做的 url <- paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack") headers = c('User-Agent' = 'Mozilla/5.0') doc <- read_html(httr::GET(url, ht

我想从此网页获取id_产品和id_父项。 昨天,我可以得到结果,但当我今天再次尝试时,我收到一条错误消息。无论如何,我是在rstudio.cloud上做的

url <-  paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack")

    headers = c('User-Agent' = 'Mozilla/5.0')
    doc <- read_html(httr::GET(url, httr::add_headers(.headers=headers)))%>%
          html_text()
    id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
    id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

    id_product
    id_parent


我一直在试图寻找可能的解释,但仍然无济于事。

服务器需要额外的标题

library(httr)
library(stringr)
library(magrittr)

headers = c(
  'User-Agent' = 'Mozilla/5.0',
  'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)

doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
       html_text()

id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

id_product
id_parent
库(httr)
图书馆(stringr)
图书馆(magrittr)
标题=c(
“用户代理”=“Mozilla/5.0”,
“接受”='text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed exchange;v=b3”
)
doc%
html_text()

我还是做不到。它就像永远在加载一样。不管怎样,我是在rstudio.cloud上做的。我不确定这有什么区别。我从R工作室运行了上面的内容。在浏览器中打开此url
https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack
并检查其网络流量。查看初始请求中使用的标题。首先重新创建整个集合,看看会发生什么。在您的情况下,可能会出现其他情况,但对我来说,服务器需要第二个头。
library(httr)
library(stringr)
library(magrittr)

headers = c(
  'User-Agent' = 'Mozilla/5.0',
  'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)

doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
       html_text()

id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

id_product
id_parent