R:如何使用数据科学工具箱对简单地址进行地理编码

R:如何使用数据科学工具箱对简单地址进行地理编码,r,maps,geocoding,R,Maps,Geocoding,我对谷歌的地理编码非常熟悉,决定尝试另一种方法。数据科学工具包()允许您对无限数量的地址进行地理编码。R有一个优秀的包,可以作为其函数的包装器(CRAN:RDSTK)。该软件包有一个名为street2coordinates()的函数,该函数与数据科学工具包的地理编码实用程序接口 但是,如果您试图对城市、国家等简单的事物进行地理编码,RDSTK函数street2coordinates()将不起作用。在以下示例中,我将尝试使用函数获取凤凰城的纬度和经度: > require("RDSTK")

我对谷歌的地理编码非常熟悉,决定尝试另一种方法。数据科学工具包()允许您对无限数量的地址进行地理编码。R有一个优秀的包,可以作为其函数的包装器(CRAN:RDSTK)。该软件包有一个名为
street2coordinates()
的函数,该函数与数据科学工具包的地理编码实用程序接口

但是,如果您试图对城市、国家等简单的事物进行地理编码,RDSTK函数
street2coordinates()
将不起作用。在以下示例中,我将尝试使用函数获取凤凰城的纬度和经度:

> require("RDSTK")
> street2coordinates("Phoenix+Arizona+United+States")
[1] full.address
<0 rows> (or 0-length row.names)
>要求(“RDSTK”)
>街道2协调(“凤凰城+亚利桑那州+美国+美国”)
[1] 完整地址
(或长度为0的行名称)
数据科学工具包中的实用程序工作得非常好。这是给出答案的URL请求:

我对地理编码多个地址(完整的地址和城市名称)感兴趣。我知道数据科学工具包URL将很好地工作如何与URL接口,并将多个纬度和经度与地址一起放入数据框?

以下是一个示例数据集:

dff <- data.frame(address=c(
  "Birmingham, Alabama, United States",
  "Mobile, Alabama, United States",
  "Phoenix, Arizona, United States",
  "Tucson, Arizona, United States",
  "Little Rock, Arkansas, United States",
  "Berkeley, California, United States",
  "Duarte, California, United States",
  "Encinitas, California, United States",
  "La Jolla, California, United States",
  "Los Angeles, California, United States",
  "Orange, California, United States",
  "Redwood City, California, United States",
  "Sacramento, California, United States",
  "San Francisco, California, United States",
  "Stanford, California, United States",
  "Hartford, Connecticut, United States",
  "New Haven, Connecticut, United States"
  ))
像这样

library(httr)
library(rjson)

data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url  <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json     <- fromJSON(content(response,type="text"))
geocode  <- do.call(rbind,sapply(json,
                                 function(x) c(long=x$longitude,lat=x$latitude)))
geocode
#                                                long      lat
# San Francisco, California, United States -117.88536 35.18713
# Mobile, Alabama, United States            -88.10318 30.70114
# La Jolla, California, United States      -117.87645 33.85751
# Duarte, California, United States        -118.29866 33.78659
# Little Rock, Arkansas, United States      -91.20736 33.60892
# Tucson, Arizona, United States           -110.97087 32.21798
# Redwood City, California, United States  -117.88536 35.18713
# New Haven, Connecticut, United States     -72.92751 41.36571
# Berkeley, California, United States      -122.29673 37.86058
# Hartford, Connecticut, United States      -72.76356 41.78516
# Sacramento, California, United States    -121.55541 38.38046
# Encinitas, California, United States     -116.84605 33.01693
# Birmingham, Alabama, United States        -86.80190 33.45641
# Stanford, California, United States      -122.16750 37.42509
# Orange, California, United States        -117.85311 33.78780
# Los Angeles, California, United States   -117.88536 35.18713
库(httr)
图书馆(rjson)
数据包括使用Google或data Science Toolkit对地理编码的支持,后者使用“Google风格的地理编码器”。正如前面的回答中所指出的,对于多个地址来说,这相当慢

library(ggmap)
result <- geocode(as.character(dff[[1]]), source = "dsk")
print(cbind(dff, result))
#                                     address        lon      lat
# 1        Birmingham, Alabama, United States  -86.80190 33.45641
# 2            Mobile, Alabama, United States  -88.10318 30.70114
# 3           Phoenix, Arizona, United States -112.07404 33.44838
# 4            Tucson, Arizona, United States -110.97087 32.21798
# 5      Little Rock, Arkansas, United States  -91.20736 33.60892
# 6       Berkeley, California, United States -122.29673 37.86058
# 7         Duarte, California, United States -118.29866 33.78659
# 8      Encinitas, California, United States -116.84605 33.01693
# 9       La Jolla, California, United States -117.87645 33.85751
# 10   Los Angeles, California, United States -117.88536 35.18713
# 11        Orange, California, United States -117.85311 33.78780
# 12  Redwood City, California, United States -117.88536 35.18713
# 13    Sacramento, California, United States -121.55541 38.38046
# 14 San Francisco, California, United States -117.88536 35.18713
# 15      Stanford, California, United States -122.16750 37.42509
# 16     Hartford, Connecticut, United States  -72.76356 41.78516
# 17    New Haven, Connecticut, United States  -72.92751 41.36571
库(ggmap)

结果它工作得几乎完美。但是,由于某种原因,它丢失了一个地址(凤凰城,亚利桑那州,美国),并且答案也被重新调整了。我认为这篇文章的问题不仅仅是凤凰城亚利桑那州。即使POST请求发送n个请求,返回的信息也是n-1。我尝试时不是这样。答复中有17项内容。对应于Phoenix的响应为空。在do.call(rbind,sapply)(json,function(x)c(long=x$longitude,:第二个参数必须是一个列表)中获取此错误。如果在第一个示例中使用
lappy
而不是
sapply
,则可以返回一个列表并避免错误消息
library(ggmap)
result <- geocode(as.character(dff[[1]]), source = "dsk")
print(cbind(dff, result))
#                                     address        lon      lat
# 1        Birmingham, Alabama, United States  -86.80190 33.45641
# 2            Mobile, Alabama, United States  -88.10318 30.70114
# 3           Phoenix, Arizona, United States -112.07404 33.44838
# 4            Tucson, Arizona, United States -110.97087 32.21798
# 5      Little Rock, Arkansas, United States  -91.20736 33.60892
# 6       Berkeley, California, United States -122.29673 37.86058
# 7         Duarte, California, United States -118.29866 33.78659
# 8      Encinitas, California, United States -116.84605 33.01693
# 9       La Jolla, California, United States -117.87645 33.85751
# 10   Los Angeles, California, United States -117.88536 35.18713
# 11        Orange, California, United States -117.85311 33.78780
# 12  Redwood City, California, United States -117.88536 35.18713
# 13    Sacramento, California, United States -121.55541 38.38046
# 14 San Francisco, California, United States -117.88536 35.18713
# 15      Stanford, California, United States -122.16750 37.42509
# 16     Hartford, Connecticut, United States  -72.76356 41.78516
# 17    New Haven, Connecticut, United States  -72.92751 41.36571