curl:：curl\u fetch\u内存错误（url，handle=handle）：发送失败：连接被重置（RStudio.cloud）_R_Web Scraping_Rcurl_Httr

curl:：curl\u fetch\u内存错误（url，handle=handle）：发送失败：连接被重置（RStudio.cloud）

r web-scraping

curl:：curl\u fetch\u内存错误（url，handle=handle）：发送失败：连接被重置（RStudio.cloud）,r,web-scraping,rcurl,httr,R,Web Scraping,Rcurl,Httr,我想从此网页获取id_产品和id_父项。昨天，我可以得到结果，但当我今天再次尝试时，我收到一条错误消息。无论如何，我是在rstudio.cloud上做的 url <- paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack") headers = c('User-Agent' = 'Mozilla/5.0') doc <- read_html(httr::GET(url, ht

我想从此网页获取id_产品和id_父项。昨天，我可以得到结果，但当我今天再次尝试时，我收到一条错误消息。无论如何，我是在rstudio.cloud上做的

url <-  paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack")

    headers = c('User-Agent' = 'Mozilla/5.0')
    doc <- read_html(httr::GET(url, httr::add_headers(.headers=headers)))%>%
          html_text()
    id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
    id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

    id_product
    id_parent

我一直在试图寻找可能的解释，但仍然无济于事。

服务器需要额外的标题

library(httr)
library(stringr)
library(magrittr)

headers = c(
  'User-Agent' = 'Mozilla/5.0',
  'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)

doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
       html_text()

id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

id_product
id_parent

库（httr）
图书馆（stringr）
图书馆（magrittr）
标题=c(
“用户代理”=“Mozilla/5.0”，
“接受”='text/html，application/xhtml+xml，application/xml；q=0.9，image/webp，image/apng，*/*；q=0.8，application/signed exchange；v=b3”
)
doc%
html_text（）
我还是做不到。它就像永远在加载一样。不管怎样，我是在rstudio.cloud上做的。我不确定这有什么区别。我从R工作室运行了上面的内容。在浏览器中打开此urlhttps://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack并检查其网络流量。查看初始请求中使用的标题。首先重新创建整个集合，看看会发生什么。在您的情况下，可能会出现其他情况，但对我来说，服务器需要第二个头。
library(httr)
library(stringr)
library(magrittr)

headers = c(
  'User-Agent' = 'Mozilla/5.0',
  'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)

doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
       html_text()

id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

id_product
id_parent