将函数应用于R中数据帧列中的每个单元格

将函数应用于R中数据帧列中的每个单元格,r,sapply,R,Sapply,编辑感谢@user5249203指出地理编码最好通过ggmaps的地理编码调用来完成。不过要小心娜娜 我正在与R的apply家族斗争 我使用的是一个函数,它接收字符串并返回经度和纬度 gGeoCode(“宾夕法尼亚州费城”) [1] 39.95258-75.16522 我有一个简单的数据框,其中包含所有52个州的名称: dput(state_lat_long) structure( list(State = structure( c( 32L, 28L, 43L, 5L,

编辑感谢@user5249203指出地理编码最好通过ggmaps的地理编码调用来完成。不过要小心娜娜

我正在与R的
apply
家族斗争

我使用的是一个函数,它接收字符串并返回经度和纬度

gGeoCode(“宾夕法尼亚州费城”) [1] 39.95258-75.16522

我有一个简单的数据框,其中包含所有52个州的名称:

dput(state_lat_long)
structure(
  list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)
为了练习我的
apply
技能,我只想将
gGeoCode
应用到
state\u lat\u long
数据框中唯一一列中的每个单元格

再简单不过了

那么这有什么问题呢

> View(apply(state_lat_long, function(x) gGeoCode(x)))
当我运行此程序时,我得到:

Error in View : argument "FUN" is missing, with no default  
我不明白,因为
FUN
并没有丢失

那么,让我们试试
sapply
。应该很简单吧

但这有什么错呢

View(sapply(state_lat_long$State, function(x) gGeoCode(x)))
当我运行这个程序时,我得到了两行50列,其中包含
NA
s。我搞不懂

接下来,我试着

View(apply(state_lat_long, 2, function(x) gGeoCode(x)))  
我得到了

     State
  40.71278
 -74.00594  
再说一次,这毫无意义


我做错了什么?谢谢

您的数据帧就是这样的吗

df = data.frame(State = c(
    32L, 28L, 43L, 5L, 23L, 34L,
    30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
    18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
    17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
    19L, 41L, 50L, 2L, 45L
  ), Label = c(
    "alabama", "alaska", "arizona",
    "arkansas", "california", "colorado", "connecticut", "delaware",
    "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
    "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
    "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
    "montana", "nebraska", "nevada", "new hampshire", "new jersey",
    "new mexico", "new york", "north carolina", "north dakota", "ohio",
    "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
    "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
    "washington", "west virginia", "wisconsin", "wyoming"
  ))

head(df)
  State      Label
1    32    alabama
2    28     alaska
3    43    arizona
4     5   arkansas
5    23 california
6    34   colorado

apply(df, 1, function(x) gGeoCode(x))
或者

mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)

注意:有些州仍然抛出
NA
。重新运行代码将获取丢失的坐标。但是,如果我们知道您的输入格式/数据帧结构,我希望它能更有效地工作。此外,确保传递的参数是
gGeoCode
所期望的参数也是很重要的

我意识到这个问题主要是关于
*apply
,但是,如果您只关注地理编码,那么更容易的选择是使用矢量化函数,例如


state\u lat\u long在应用时需要输入3个参数。第一个是您的对象(例如dataframe),第二个指示是应用于行还是列(列需要2个),第三个是有趣的。在您的代码中,缺少第三个参数,因此请尝试查看(apply(state_lat_long,2,function(x)gGeoCode(x)),您可以查看我对原始问题的编辑吗?可能我弄错了,应该是View(apply(state_lat_long,1,function(x)gGeoCode(x))?如果不是的话,它可能没有我想象的那么简单,我需要看看您用于gGeoCode的代码是否有帮助(我可能有,也可能没有)。分解代码,一次运行一个<代码>边距
=因为
也很重要,此外,了解如何生成
数据帧
?您是如何生成数据帧的?另外,它还有助于理解您是否传递了函数
gGeoCode
所期望的正确参数。因此,状态抛出的原因
NA
是由于函数,而不是应用。我明白了。我想你需要了解fuxntion是如何工作的。当你们分别传递州名时,它确实给出了坐标。但这里的问题是,传递它的方式或传递函数的内容。Apply或mapply帮助您在不使用for循环的情况下应用函数。但是,你需要知道哪些坐标是正确的,哪些是错误的。我认为这是最好的主意——不要使用APPLY,而是使用ggmap中的地理代码。
state_lat_long <- structure(
    list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

library(ggmap)

## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
#           lon      lat
# 1   -74.00594 40.71278
# 2  -116.41939 38.80261
# 3   -99.90181 31.96860
# 4  -119.41793 36.77826
# 5   -94.68590 46.72955
# 6  -101.00201 47.55149