Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在3行滚动块中聚合数据帧_R_Aggregation_Rolling Computation - Fatal编程技术网

R 在3行滚动块中聚合数据帧

R 在3行滚动块中聚合数据帧,r,aggregation,rolling-computation,R,Aggregation,Rolling Computation,我以下面的数据框为例 df <- data.frame(score=letters[1:15], total1=1:15, total2=16:30) > df score total1 total2 1 a 1 16 2 b 2 17 3 c 3 18 4 d 4 19 5 e 5 20 6 f 6 21

我以下面的数据框为例

   df <- data.frame(score=letters[1:15], total1=1:15, total2=16:30)
> df
   score total1 total2
1      a      1     16
2      b      2     17
3      c      3     18
4      d      4     19
5      e      5     20
6      f      6     21
7      g      7     22
8      h      8     23
9      i      9     24
10     j     10     25
11     k     11     26
12     l     12     27
13     m     13     28
14     n     14     29
15     o     15     30
这类问题的所有给定答案都假定字符串在行中重复

我用于获取摘要的常用
聚合
函数提供了不同的结果:

aggregate(df$total1, by=list(sum1=df$score %in% c('a','b','c'), sum2=df$score %in% c('d','e','f')), FUN=sum)
   sum1  sum2  x
1 FALSE FALSE 99
2  TRUE FALSE  6
3 FALSE  TRUE 15
添加带有类别值的“组”列

df$groups = NA
然后按如下方式定义每个组:

df$groups[df$score=="a" | df$score=="b" | df$score=="c" ] = "a-b-c"

最后按该列进行聚合。

如果您想要tidyverse解决方案,有一种可能性:

df <- data.frame(score=letters[1:15], total1=1:15, total2=16:30)

df %>%
  mutate(groups = case_when(
    score %in% c("a","b","c") ~ "a-b-c",
    score %in% c("d","e","f") ~ "d-e-f"
  )) %>%
  group_by(groups) %>%
  summarise_if(is.numeric, sum)
df%
变异(组=情况)(
在%c(“a”、“b”、“c”)~“a-b-c”中得分%,
%c(“d”、“e”、“f”)~“d-e-f”中的分数%
)) %>%
分组依据(组)%>%
如果(是数字,求和)则汇总
返回

# A tibble: 3 x 3
  groups total1 total2
  <chr>   <int>  <int>
1 a-b-c       6     51
2 d-e-f      15     60
3 <NA>       99    234
#一个tible:3 x 3
组总数1组总数2组
1 a-b-c 6 51
2 d-e-f 15 60
3        99    234

这是一个适用于任何大小数据帧的解决方案

df <- data.frame(score=letters[1:15], total1=1:15, total2=16:30)

# I'm adding a row to demonstrate that the grouping pattern works when the 
# number of rows is not equally divisible by 3.
df <- rbind(df, data.frame(score = letters[16], total1 = 16, total2 = 31))

# A vector that represents the correct groupings for the data frame.
groups <- c(rep(1:floor(nrow(df) / 3), each = 3), 
            rep(floor(nrow(df) / 3) + 1, nrow(df) - length(1:(nrow(df) / 3)) * 3))

# Your method of aggregation by `groups`. I'm going to use `data.table`.
require(data.table)
dt <- as.data.table(df)
dt[, group := groups]

aggDT <- dt[, list(score = paste0(score, collapse = "-"), 
          total1 = sum(total1), total2 = sum(total2)), by = group][
            , group := NULL]
aggDT

   score total1 total2
1: a-b-c      6     51
2: d-e-f     15     60
3: g-h-i     24     69
4: j-k-l     33     78
5: m-n-o     42     87
6:     p     16     31

df另外,大多数人都避免使用
c
作为变量名,因为它是最常见的函数名。第一个示例有重叠的组,
“a-b-c”
“c-d-e”
,两个组中都有
c
。第二个示例具有非重叠组
“a-b-c”
“d-e-f”
。你想要哪个?是每三行指定一次,还是指定字母组合?是每N=3行在一个名称下聚合一次?像‘a-b-c’、‘c-d-e’、,。。。或者像‘a-b-c’、‘d-e-f’、,。。。?如果nrows不是3的倍数,那么如何处理结尾处的粗糙度?PS我将您的数据帧名称从混乱的
c
编辑为清晰的
df
df <- data.frame(score=letters[1:15], total1=1:15, total2=16:30)

# I'm adding a row to demonstrate that the grouping pattern works when the 
# number of rows is not equally divisible by 3.
df <- rbind(df, data.frame(score = letters[16], total1 = 16, total2 = 31))

# A vector that represents the correct groupings for the data frame.
groups <- c(rep(1:floor(nrow(df) / 3), each = 3), 
            rep(floor(nrow(df) / 3) + 1, nrow(df) - length(1:(nrow(df) / 3)) * 3))

# Your method of aggregation by `groups`. I'm going to use `data.table`.
require(data.table)
dt <- as.data.table(df)
dt[, group := groups]

aggDT <- dt[, list(score = paste0(score, collapse = "-"), 
          total1 = sum(total1), total2 = sum(total2)), by = group][
            , group := NULL]
aggDT

   score total1 total2
1: a-b-c      6     51
2: d-e-f     15     60
3: g-h-i     24     69
4: j-k-l     33     78
5: m-n-o     42     87
6:     p     16     31