R 使用爬虫的网络爬虫_R_Web Scraping_Rcrawler

R 使用爬虫的网络爬虫

r web-scraping

R 使用爬虫的网络爬虫,r,web-scraping,rcrawler,R,Web Scraping,Rcrawler,我想为网站“”使用R程序构建一个webcrawler，该程序可以访问带有地址参数的网站，然后从该网站获取生成的纬度和经度。这会对我得到的数据集的长度重复由于我是网络爬虫领域的新手，我会寻求指导提前感谢。过去我使用了一个名为IP stack（ipstack.com）的API 示例：数据帧“d”，其中包含一列称为“ipAddress”的IP地址 for(i in 1:nrow(d)){ #get data from API and save the text to variable 'str

我想为网站“”使用R程序构建一个webcrawler，该程序可以访问带有地址参数的网站，然后从该网站获取生成的纬度和经度。这会对我得到的数据集的长度重复

由于我是网络爬虫领域的新手，我会寻求指导

提前感谢。

过去我使用了一个名为IP stack（ipstack.com）的API

示例：数据帧“d”，其中包含一列称为“ipAddress”的IP地址

for(i in 1:nrow(d)){
  #get data from API and save the text to variable 'str'
  lookupPath <- paste("http://api.ipstack.com/", d$ipAddress[i], "?access_key=INSERT YOUR API KEY HERE&format=1", sep = "")
  str <- readLines(lookupPath)

  #save all the data to a file
  f <- file(paste(i, ".txt", sep = ""))
  writeLines(str,f)
  close(f)

  #save data to main data frame 'd' as well:
  d$ipCountry[i]<-str[7]
  print(paste("Successfully saved ip #:", i))
}

for（1中的i:nrow（d））{
#从API获取数据并将文本保存到变量“str”
lookupPath因为没有API，所以如果网页没有内置API，你最好的选择是RseleniumIt。如果你有兴趣在没有这种思维实验的情况下定位地址，那么R中的ggmap包就有了工具。你要做的操作叫做地理编码，在我的经验中，它通常不是免费的体验；你最多只能希望有一个限制API，每天提供有限数量的免费请求（例如谷歌）。因此，以你希望的自动化方式使用本网站可能违反服务条款。尽管该网站允许网络抓取（见siteaddress/robots.txt），但在后台它调用的不是免费的