Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
添加[[j]]或R中循环内每行中使用的其他信息_R_Loops_Web Scraping - Fatal编程技术网

添加[[j]]或R中循环内每行中使用的其他信息

添加[[j]]或R中循环内每行中使用的其他信息,r,loops,web-scraping,R,Loops,Web Scraping,我的疑问是如何在“My_data”(My_data$sector)中包含一列,显示该行使用了什么url\u列表[[j]或url\u info。 每个url将为我带来一个表(35 x 100),我需要显示当把所有内容放在一起时,源是什么元素 url_list <- vector() url_info <- vector() # then, i feed it. total_pages <- 1:5 #for my use, i need almost 100 pages

我的疑问是如何在“
My_data
”(
My_data$sector
)中包含一列,显示该行使用了什么
url\u列表[[j]
url\u info
。 每个url将为我带来一个表(35 x 100),我需要显示当把所有内容放在一起时,源是什么元素

url_list <- vector()
url_info <- vector()

# then, i feed it. 
total_pages <- 1:5   #for my use, i need almost 100 pages

for (i in total_pages) {
    url_list [i] <- paste('http://www.mylink/result.php?sector=',i,sep = "")
    url_info [i] <- paste('sector_',i,sep = "")
}

url_list
>> [1] "http://www.mylink/result.php?sector=1" "http://www.mylink/result.php?sector=2"
[3] "http://www.mylink/result.php?sector=3" "http://www.mylink/result.php?sector=4"
[5] "http://www.mylink/result.php?sector=5"

url_info
>> [1] "sector_1" "sector_2" "sector_3" "sector_4" "sector_5"

#scraping
my_data <- list()

for (j in seq_along(url_list)) {
    my_data[[j]] <- url_list[[j]] %>% 
        read_html() %>% 
        html_node("table") %>%
        html_table()
}


final_data <- cbind(do.call(rbind, my_data))
url\u列表%
html_表()
}

最后的数据像这样的东西应该有用

library(tidyverse)
library(xml2)
pipe_function <- . %>% 
  read_html() %>% 
  html_node("table") %>%
  html_table()

tibble(url_info,url_list) %>% 
  mutate(table = url_list %>% map_dfr(pipe_function))
库(tidyverse)
库(xml2)
管道功能%
读取html()%>%
html_节点(“表”)%%>%
html_表()
TIBLE(url\u信息,url\u列表)%>%
变异(表=url\u列表%>%map\u dfr(管道函数))

我没有一个url列表,其中包含可供您查找的表,但请尝试以下操作,它会将url附加到最后一列

您必须在rbind的实际数据上进行尝试:

my_data <- list()
url_list=c(
"http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population",
"https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_historical_population",
"https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population")

for (j in seq_along(url_list)) {
    my_data[[j]] <- url_list[[j]] %>% 
        read_html() %>% 
        html_node("table") %>%
        html_table() %>%
        mutate(url=url_list[j])
}
my_数据%
html_节点(“表”)%%>%
html_表()%>%
变异(url=url\U列表[j])
}

错误在%>%read_html()%%>%html\u node(“table”)%%>%html\u table():找不到函数“%%>%”是否加载了tidyverse?是的,该行运行良好。但它返回了另一个错误:
Erro:“扇区1”在当前工作目录('c:/program files/rstudio/bin')中不存在。