求R中相应元素的范围和平均值_R_Range_Average

求R中相应元素的范围和平均值

求R中相应元素的范围和平均值,r,range,average,R,Range,Average,我在一个数据集中有不同的数字范围（或坐标），我想找到合适的数字范围，然后取相应分数的平均值假设我的数据集是： coordinate score 1000 1.1 1001 1.2 1002 1.1 1003 1.4 1006 1.8 1007 1.9 1010 0.5 1011 1.1 1012 1.0 我应该找到合适的边界（当坐标不连续时），然后计算每个特定

我在一个数据集中有不同的数字范围（或坐标），我想找到合适的数字范围，然后取相应分数的平均值

假设我的数据集是：

coordinate score    
     1000   1.1
     1001   1.2
     1002   1.1
     1003   1.4
     1006   1.8
     1007   1.9
     1010   0.5
     1011   1.1
     1012   1.0

我应该找到合适的边界（当坐标不连续时），然后计算每个特定范围的平均值

我期望的结果是：

start end mean-score
1000 1003  1.2
1006 1007  1.85
1010 1012  0.86

试试这个（假设

df

是您的数据集）

或者使用

dplyr

library(dplyr)
df %>%
  mutate(indx = dense_rank(cumsum(c(1, diff(coordinate)) - 1))) %>%
  group_by(indx) %>%
  summarise(start = first(coordinate),
            end = last(coordinate),
            mean_score = round(mean(score), 2))

# Source: local data frame [3 x 4]
# 
#   indx start  end mean_score
# 1    1  1000 1003       1.20
# 2    2  1006 1007       1.85
# 3    3  1010 1012       0.87

以下是一些可供选择的base R解决方案（效率要低得多）

创建

indx

是一个绝妙的想法。你能解释一下这个部分吗？感谢

diff

计算连续值之间的差异。我正在执行

-1

，因此只有差值为1的值的值将为0。因此，当执行

cumsum

时，它将它们计算在与

0+0+0=0

相同的组中。当距离大于1时，cumsum将添加一个新值，然后继续添加零，这样它将成为一个新组，依此类推。有关

.GRP

的信息，请参见

？数据表。实际上，你也可以用base R来实现这一点，但效率要低得多。我会尽快添加base R解决方案。我将有机会查看我对base R的编辑和dplyr
dplyr也应该非常高效
library(dplyr)
df %>%
  mutate(indx = dense_rank(cumsum(c(1, diff(coordinate)) - 1))) %>%
  group_by(indx) %>%
  summarise(start = first(coordinate),
            end = last(coordinate),
            mean_score = round(mean(score), 2))

# Source: local data frame [3 x 4]
# 
#   indx start  end mean_score
# 1    1  1000 1003       1.20
# 2    2  1006 1007       1.85
# 3    3  1010 1012       0.87

df$indx <- as.numeric(factor(cumsum(c(1, diff(df$coordinate)) - 1)))
cbind(aggregate(coordinate ~ indx, df, function(x) c(start = head(x, 1), end = tail(x, 1))),
      aggregate(score ~ indx, df, function(x) mean_score = round(mean(x), 2)))

#   indx coordinate.start coordinate.end indx score
# 1    1             1000           1003    1  1.20
# 2    2             1006           1007    2  1.85
# 3    3             1010           1012    3  0.87

cbind(do.call(rbind, (with(df, tapply(coordinate, indx, function(x) c(start = head(x, 1), end = tail(x, 1)))))),
with(df, tapply(score, indx, function(x) mean_score = round(mean(x), 2))))

#   start  end     
# 1  1000 1003 1.20
# 2  1006 1007 1.85
# 3  1010 1012 0.87