curl::curl\u fetch\u内存错误(url,handle=handle):发送失败:连接被重置(RStudio.cloud)
我想从此网页获取id_产品和id_父项。 昨天,我可以得到结果,但当我今天再次尝试时,我收到一条错误消息。无论如何,我是在rstudio.cloud上做的curl::curl\u fetch\u内存错误(url,handle=handle):发送失败:连接被重置(RStudio.cloud),r,web-scraping,rcurl,httr,R,Web Scraping,Rcurl,Httr,我想从此网页获取id_产品和id_父项。 昨天,我可以得到结果,但当我今天再次尝试时,我收到一条错误消息。无论如何,我是在rstudio.cloud上做的 url <- paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack") headers = c('User-Agent' = 'Mozilla/5.0') doc <- read_html(httr::GET(url, ht
url <- paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack")
headers = c('User-Agent' = 'Mozilla/5.0')
doc <- read_html(httr::GET(url, httr::add_headers(.headers=headers)))%>%
html_text()
id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]
id_product
id_parent
我一直在试图寻找可能的解释,但仍然无济于事。服务器需要额外的标题
library(httr)
library(stringr)
library(magrittr)
headers = c(
'User-Agent' = 'Mozilla/5.0',
'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)
doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
html_text()
id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]
id_product
id_parent
库(httr)
图书馆(stringr)
图书馆(magrittr)
标题=c(
“用户代理”=“Mozilla/5.0”,
“接受”='text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed exchange;v=b3”
)
doc%
html_text()
我还是做不到。它就像永远在加载一样。不管怎样,我是在rstudio.cloud上做的。我不确定这有什么区别。我从R工作室运行了上面的内容。在浏览器中打开此urlhttps://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack
并检查其网络流量。查看初始请求中使用的标题。首先重新创建整个集合,看看会发生什么。在您的情况下,可能会出现其他情况,但对我来说,服务器需要第二个头。
library(httr)
library(stringr)
library(magrittr)
headers = c(
'User-Agent' = 'Mozilla/5.0',
'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)
doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
html_text()
id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]
id_product
id_parent