Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在数据帧中聚合行_R_Dataframe_Aggregate - Fatal编程技术网

R 在数据帧中聚合行

R 在数据帧中聚合行,r,dataframe,aggregate,R,Dataframe,Aggregate,我有一个这种格式的data.frame: df <- data.frame(time = seq(0.2,4,0.2), behavior = c(rep(0,4),rep(1,4),rep(2,4),rep(0,4),rep(1,4)), n1 = rnorm(20), n2 = rnorm(20)) 实现这一点最有效的方法是什么?这里有一种使用dplyr的方法。由于您在df中使用了不带set.seed的rnorm,因此我的结果与您的结果不同 df %>% group_b

我有一个这种格式的data.frame:

df <- data.frame(time = seq(0.2,4,0.2), behavior = c(rep(0,4),rep(1,4),rep(2,4),rep(0,4),rep(1,4)), n1 = rnorm(20), n2 = rnorm(20))
实现这一点最有效的方法是什么?

这里有一种使用dplyr的方法。由于您在df中使用了不带set.seed的rnorm,因此我的结果与您的结果不同

df %>%
    group_by(group = cumsum(c(T, diff(behavior) != 0))) %>% # assigning groups
    summarise(Time = sum(time),
              ave.n1 = sum(n1) / Time,
              ave.n2 = sum(n2) / Time)


# group Time      ave.n1      ave.n2
#1    1  2.0  0.68164245 -1.57266432
#2    2  5.2 -0.26419520  0.19598772
#3    3  8.4 -0.04105184  0.24406783
#4    4 11.6  0.10536325 -0.28962844
#5    5 14.8 -0.09449933 -0.02142792
如果你有n1-n200,你可以这样做。请注意,您的n1-n200是 这里被覆盖了。您可以进行变异\u eachfuns./time,vars=matches^n。这将 创建200个列名为var1、var2的列。你需要自己替换这些名字。 对于当前版本的dplyr,重命名部分有点麻烦。但你很容易做到 例如,这是使用gsub实现的

df %>%
    group_by(group = cumsum(c(T, diff(behavior) != 0))) %>%
    summarise_each(funs(sum = sum(., na.rm = TRUE))) %>%
    mutate_each(funs(./time), matches("^n")) %>%
    select(-behavior)
如果你想保持原来的行为,你可以这样做

df %>%
    group_by(group = cumsum(c(T, diff(behavior) != 0))) %>%
    summarise(behavior = behavior[1]) -> foo;
    df %>%
    group_by(group = cumsum(c(T, diff(behavior) != 0))) %>%
    summarise(Time = sum(time),
              ave.n1 = sum(n1) / Time,
              ave.n2 = sum(n2) / Time) %>%
    do(cbind(.,foo[,2]))

# group Time      ave.n1      ave.n2 behavior
#1    1  2.0  0.93849292  0.90373785        0
#2    2  5.2  0.26211881 -0.11678684        1
#3    3  8.4  0.12171471  0.15838066        2
#4    4 11.6  0.11046081  0.17450358        0
#5    5 14.8 -0.06480093  0.03401513        1

美好的我不知道您可以在组中添加/更改列_by@RichardScriven谢谢您可以在起重机手册中看到这一点。节省打字的好方法,对吧?谢谢!如果不是只有n1,n2,而是有n1,n2,…,n200列,我想取平均值呢?
df %>%
    group_by(group = cumsum(c(T, diff(behavior) != 0))) %>%
    summarise(behavior = behavior[1]) -> foo;
    df %>%
    group_by(group = cumsum(c(T, diff(behavior) != 0))) %>%
    summarise(Time = sum(time),
              ave.n1 = sum(n1) / Time,
              ave.n2 = sum(n2) / Time) %>%
    do(cbind(.,foo[,2]))

# group Time      ave.n1      ave.n2 behavior
#1    1  2.0  0.93849292  0.90373785        0
#2    2  5.2  0.26211881 -0.11678684        1
#3    3  8.4  0.12171471  0.15838066        2
#4    4 11.6  0.11046081  0.17450358        0
#5    5 14.8 -0.06480093  0.03401513        1