考虑到R中纬度和经度的值，如何计算两个不同变量的平均值？_R_Latitude Longitude_Bioinformatics_Mean_Biometrics

考虑到R中纬度和经度的值，如何计算两个不同变量的平均值？

考虑到R中纬度和经度的值，如何计算两个不同变量的平均值？,r,latitude-longitude,bioinformatics,mean,biometrics,R,Latitude Longitude,Bioinformatics,Mean,Biometrics,我目前正试图从表中获取R中的一些数据我有一个包含两个不同变量的数据集，全球海表温度SST的年范围和年平均值。我有这些值，每个纬度从90到-90，经度从180到-180 我想获得上述变量的年范围平均值和纬度/经度5x5网格单元的年平均值。例如，我需要知道经度在-180和-176之间以及纬度在90和86之间的年平均范围，依此类推，直到得到所有可能的5x5网格单元的该变量的平均值我的数据如下所示： lon lat年平均年平均范围 1 0.5 89.5 -1.8

我目前正试图从表中获取R中的一些数据

我有一个包含两个不同变量的数据集，全球海表温度SST的年范围和年平均值。我有这些值，每个纬度从90到-90，经度从180到-180

我想获得上述变量的年范围平均值和纬度/经度5x5网格单元的年平均值。例如，我需要知道经度在-180和-176之间以及纬度在90和86之间的年平均范围，依此类推，直到得到所有可能的5x5网格单元的该变量的平均值

我的数据如下所示：

lon lat年平均年平均范围 1 0.5 89.5 -1.8 0 2 1.5 89.5 -1.8 0 3 2.5 89.5 -1.8 0 4 3.5 89.5 -1.8 0 5 4.5 89.5 -1.8 0 6 5.5 89.5 -1.8 0 ... 52001 354.5 -89.5 -1.8 0 52002 355.5 -89.5 -1.8 0 52003 356.5 -89.5 -1.8 0 52004 357.5 -89.5 -1.8 0 52005 358.5 -89.5 -1.8 0 52006 359.5 -89.5 -1.8 0

提前感谢您

您可以使用光栅软件包及其聚焦功能进行移动窗口的计算

首先，我将创建一个表示数据的伪data.frame

# Prepare dummy data.frame
set.seed(2222)
lonlat <- expand.grid(1:10, 1:10)
df <- data.frame( lon = lonlat[, 1],
                  lat = lonlat[, 2],
                  ANNUAL_MEAN = rnorm(100),
                  ANNUAL_RANGE = runif(100, 1, 5)
                )

这是一个使用tidyverse中包含的dplyr包的解决方案。它应该很容易遵循，一步一步

library(tidyverse)

# set.seed() assures reproducability of the example with identical random numbers
set.seed(42)


# build a simulated data set as described in the question
lats <- seq(from = -90, to = 90, by = 0.5)
lons <- seq(from = -180, to = 179.5, by = 0.5) # we must omit +180 or we would
                                               # double count those points
                                               # since they coincide with -180

    # combining each latitude point with each longitude point
coord <- merge(lats, lons) %>%
    rename(lat = x) %>% 
    rename(lon = y) %>%
    # adding simulated values
    mutate(annual_mean = runif(n = nrow(.), min = -2, max = 2)) %>%
    mutate(annual_range = runif(n = nrow(.), min = 0, max = 3)) %>% 
    # defining bands of 5 latitude and 5 longitude points by using integer division
    mutate(lat_band = lat%/%5) %>% 
    mutate(lon_band = lon%/%5) %>% 
    # creating a name label for each unique 5x5 gridcell
    mutate(gridcell_5x5 = paste(lat_band, lon_band, sep = ",")) %>%
    # group-by instruction, much like in SQL
    group_by(lat_band, lon_band, gridcell_5x5) %>% 
    # sorting to get a nice order
    arrange(lat_band, lon_band) %>% 
    # calculating minimum and maximum latitude and longitude for each gridcell
    # calculating the mean values per gridcell
    summarize(gridcell_min_lat = min(lat), 
              gridcell_max_lat = max(lat),
              gridcell_min_lon = min(lon),
              gridcell_max_lon = max(lon),
              gridcell_mean_annual_mean = round(mean(annual_mean), 3),
              gridcell_mean_annual_range = round(mean(annual_range), 3) )

给我们看一些数据和代码。我尝试了你的脚本，但我得到了虚假的平均值，我认为这是因为脚本与我的数据集的年度范围和年度平均值的原始值没有关联。你能帮我做这个吗？高级解决方案示例中的储罐包含随机生成的虚拟值。您必须将解决方案应用于实际数据集。你应该把你的数据集和使用我的代码从注释行开始定义带5。。。等。这应该是一个很好的学习练习，以便使用R@DiegoCepeda来加快速度。当然，这些值与原始值不一致。有评论说它使用模拟数据，这一点似乎非常清楚。如果您想使用数据，请从文件中读取。是的，我知道您的示例包含随机数据，我使用自己的数据尝试了脚本，但我只是随机检查了一些网格单元，在Excel中进行计算，结果完全不同如果您与Excel进行比较，并得到不同的结果，一种可能的解释是gridcell限制的选取方式不同。还有，你想要的是固定网格单元还是每个坐标的移动平均值为5x5？我的理解是，Diego需要一个二维的分格，每个分格的平均值，而不是二维移动平均值？根据这个短语：所有可能的5x5网格单元的这个变量的平均值。我知道他需要搬家。对不起，我认为我的解释不够好。但是的，@stenevang是对的，我需要固定的5x5网格单元，每个单元的平均值。好的，我们可以使用raster:：agregate进行5x5下采样。非常感谢，现在我得到了我需要的。

# perform an aggregation with given downsampling factor
rdf_d <- aggregate(rdf, fact=5, fun = mean)

# Now each pixel in the raster `rdf_d` contains a mean value of 5x5 pixels from initial `rdf`
# we need to get pixels coordinates and their values
coord <- coordinates(rdf_d)
vals <- as.data.frame(rdf_d)
colnames(coord) <- c("lon", "lat")
colnames(vals) <- c("ANNUAL_MEAN_AVG", "ANNUAL_RANGE_AVG")

res <- cbind(coord, vals)

library(tidyverse)

# set.seed() assures reproducability of the example with identical random numbers
set.seed(42)


# build a simulated data set as described in the question
lats <- seq(from = -90, to = 90, by = 0.5)
lons <- seq(from = -180, to = 179.5, by = 0.5) # we must omit +180 or we would
                                               # double count those points
                                               # since they coincide with -180

    # combining each latitude point with each longitude point
coord <- merge(lats, lons) %>%
    rename(lat = x) %>% 
    rename(lon = y) %>%
    # adding simulated values
    mutate(annual_mean = runif(n = nrow(.), min = -2, max = 2)) %>%
    mutate(annual_range = runif(n = nrow(.), min = 0, max = 3)) %>% 
    # defining bands of 5 latitude and 5 longitude points by using integer division
    mutate(lat_band = lat%/%5) %>% 
    mutate(lon_band = lon%/%5) %>% 
    # creating a name label for each unique 5x5 gridcell
    mutate(gridcell_5x5 = paste(lat_band, lon_band, sep = ",")) %>%
    # group-by instruction, much like in SQL
    group_by(lat_band, lon_band, gridcell_5x5) %>% 
    # sorting to get a nice order
    arrange(lat_band, lon_band) %>% 
    # calculating minimum and maximum latitude and longitude for each gridcell
    # calculating the mean values per gridcell
    summarize(gridcell_min_lat = min(lat), 
              gridcell_max_lat = max(lat),
              gridcell_min_lon = min(lon),
              gridcell_max_lon = max(lon),
              gridcell_mean_annual_mean = round(mean(annual_mean), 3),
              gridcell_mean_annual_range = round(mean(annual_range), 3) )