R 当在浏览器中打开刮取的链接时，它会显示为已断开，刷新页面后，它会正常工作_R_Rvest

R 当在浏览器中打开刮取的链接时，它会显示为已断开，刷新页面后，它会正常工作

R 当在浏览器中打开刮取的链接时，它会显示为已断开，刷新页面后，它会正常工作,r,rvest,R,Rvest,下面的代码做了很多事情，但主要是它进入了time dot mk.网页的存档，并在7个月内提取了大量链接，之后我清除了我想要的网页中的链接数据（time dot mk从许多不同的网页编译新闻）：库（rvest） y% html_节点（“a”）%>% html_attr（“href”）仅周% 粘贴0（）}}）印刷品（分配者） fullinks=paste0（标记，分配者，sep=”“）打印（全墨水） #所有主要链接。其他文章已获得->需要获得 .其他文章所有链接% html_节点（“h1

下面的代码做了很多事情，但主要是它进入了time dot mk.网页的存档，并在7个月内提取了大量链接，之后我清除了我想要的网页中的链接数据（time dot mk从许多不同的网页编译新闻）：

库（rvest）
y%
html_节点（“a”）%>%
html_attr（“href”）
仅周%
粘贴0（）}}）
印刷品（分配者）
fullinks=paste0（标记，分配者，sep=”“）
打印（全墨水）
#所有主要链接。其他文章已获得->需要获得
.其他文章
所有链接%
html_节点（“h1 a”）%>%
html_attr（“href”）%%>%
粘贴（0（）}）
打印（所有链接）
#获取所有链接->尝试对我现在需要的数据进行排序->a1
现在还没有
A上下文%
html_节点（“p”）%>%
html_text（）%>%
粘贴0（折叠=）}）

问题是，一旦我只检索了必要的链接，一旦我尝试从它们中只刮取文本，R就会返回一个错误，即：

打开连接时出错（x，“rb”）：无法识别或错误的HTTP内容或传输编码

据我所知，这是因为当我试图在浏览器中打开链接时，它们被破坏，并且（几乎！）每个链接都会返回这样的内容： Wd�R$��&��)/��Y�F�R�

在我的浏览器中刷新链接5-6次后，它将正常加载

检索到的链接如下所示：

[965]。”http://a1on.mk/wordpress/archives/655118" [967] "http://a1on.mk/wordpress/archives/654641"

我真的不确定这里的问题是什么，我想知道如何让R运行代码来提取文本，直到它能够提取文本为止。像try和tryCatch这样的功能在这里还没有被证明是有用的。

我还不确定它是否相关，但您是否使用Safari作为默认浏览器？不，我使用的是Chrome。我认为这并不重要。我不知道这是什么，我还不确定是否相关，但你是否使用Safari作为默认浏览器？不，我使用的是Chrome。我认为这并不重要。我不知道那会是什么

library(rvest)
y <- NULL
for (week in 22:49) {
firstdate = paste0("http://www.time.mk/week/2016/", week)
frontpage = read_html(firstdate) %>%
html_nodes("a") %>%
html_attr("href")
justweeks <- frontpage[grepl("^week.*", frontpage)]
mark = "http://www.time.mk/"
weeklinks = paste0(mark, justweeks)
weeklinks = unique(weeklinks)
y <- rbind (y, weeklinks)
allothers <- unlist (lapply(y, function(i) {
read_html(i) %>%
html_nodes(".other_articles") %>%
html_attr("href") %>%
paste0()}))}
print(allothers)
fullinks = paste0 (mark, allothers, sep = "")
print(fullinks)
#all primary links .other articles obtained ->  need to obtain the 
.otherarticles 
alllinks <- unlist (lapply (fullinks, function (i) {
read_html(i) %>%
html_nodes("h1 a") %>%
html_attr("href") %>%
paste0()}))
print(alllinks)
#all links obtained -> try to sort data that I need now -> a1 
a1onLinks <- alllinks[grepl(".*a1on*", alllinks)]
print(a1onLinks)
#obtaining text -> not there just yet 
a1ontext <- lapply(a1onLinks, function (i) {
try(
read_html(i) %>%
html_nodes ("p") %>%
html_text() %>%
paste0(collapse=""))})