如何根据R中的三个不同条件求和_R_Group By_Dplyr_Sum

如何根据R中的三个不同条件求和

如何根据R中的三个不同条件求和,r,group-by,dplyr,sum,R,Group By,Dplyr,Sum,以下是我的数据 gcode code year P Q 1 101 2000 1 3 1 101 2001 2 4 1 102 2000 1 1 1 102 2001 4 5 1 102 2002 2 6 1 102 2003 6 5 1 103 1999 6 1 1 103 2000 4 2 1 103 2001 2 1 2 104

以下是我的数据

gcode code year   P  Q
1      101  2000  1  3
1      101  2001  2  4
1      102  2000  1  1
1      102  2001  4  5
1      102  2002  2  6
1      102  2003  6  5
1      103  1999  6  1
1      103  2000  4  2
1      103  2001  2  1
2      104  2000  1  3
2      104  2001  2  4
2      105  2001  4  5
2      105  2002  2  6
2      105  2003  6  5
2      105  2004  6  1
2      106  2000  4  2
2      106  2001  2  1

gcode

1有3个不同的代码101、102和103。它们都有相同的年份（2000年和2001年）。我想总结一下这些年来的

和

。否则，我想删除不相关的数据。我也希望对

gcode

2执行同样的操作

我怎样才能得到这样的结果

gcode  year   P       Q
1      2000   1+1+4   3+1+2
1      2001   2+4+2   4+5+1
2      2001   2+4+2   4+5+1

我们可以基于

gcode

对数据进行

拆分

以

gcode

和

year

为基础对所有

code

和

中存在的常见code
数据进行子集
do.call(rbind, lapply(split(df, df$gcode), function(x) {
      aggregate(cbind(P, Q)~gcode+year, 
               subset(x, year %in% Reduce(intersect, split(x$year, x$code))), sum)
}))

#    gcode year P  Q
#1.1     1 2000 6  6
#1.2     1 2001 8 10
#2       2 2001 8 10


使用具有类似逻辑的dplyr
，我们可以
library(dplyr)
df %>%
  group_split(gcode) %>%
  purrr::map_df(. %>% 
                 group_by(year) %>% 
                 filter(n_distinct(code) == n_distinct(.$code)) %>% 
                 group_by(gcode, year) %>%
                 summarise_at(vars(P:Q), sum))

数据
df <- structure(list(gcode = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), code = c(101L, 101L, 102L, 102L, 
102L, 102L, 103L, 103L, 103L, 104L, 104L, 105L, 105L, 105L, 105L, 
106L, 106L), year = c(2000L, 2001L, 2000L, 2001L, 2002L, 2003L, 
1999L, 2000L, 2001L, 2000L, 2001L, 2001L, 2002L, 2003L, 2004L, 
2000L, 2001L), P = c(1L, 2L, 1L, 4L, 2L, 6L, 6L, 4L, 2L, 1L, 
2L, 4L, 2L, 6L, 6L, 4L, 2L), Q = c(3L, 4L, 1L, 5L, 6L, 5L, 1L, 
2L, 1L, 3L, 4L, 5L, 6L, 5L, 1L, 2L, 1L)), class = "data.frame", 
row.names = c(NA, -17L))

df我们可以根据gcode
对数据进行拆分
将基于普通year
的数据子集，该数据存在于所有code
和aggregate
数据中
do.call(rbind, lapply(split(df, df$gcode), function(x) {
      aggregate(cbind(P, Q)~gcode+year, 
               subset(x, year %in% Reduce(intersect, split(x$year, x$code))), sum)
}))

#    gcode year P  Q
#1.1     1 2000 6  6
#1.2     1 2001 8 10
#2       2 2001 8 10


使用具有类似逻辑的dplyr
，我们可以
library(dplyr)
df %>%
  group_split(gcode) %>%
  purrr::map_df(. %>% 
                 group_by(year) %>% 
                 filter(n_distinct(code) == n_distinct(.$code)) %>% 
                 group_by(gcode, year) %>%
                 summarise_at(vars(P:Q), sum))

数据
df <- structure(list(gcode = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), code = c(101L, 101L, 102L, 102L, 
102L, 102L, 103L, 103L, 103L, 104L, 104L, 105L, 105L, 105L, 105L, 
106L, 106L), year = c(2000L, 2001L, 2000L, 2001L, 2002L, 2003L, 
1999L, 2000L, 2001L, 2000L, 2001L, 2001L, 2002L, 2003L, 2004L, 
2000L, 2001L), P = c(1L, 2L, 1L, 4L, 2L, 6L, 6L, 4L, 2L, 1L, 
2L, 4L, 2L, 6L, 6L, 4L, 2L), Q = c(3L, 4L, 1L, 5L, 6L, 5L, 1L, 
2L, 1L, 3L, 4L, 5L, 6L, 5L, 1L, 2L, 1L)), class = "data.frame", 
row.names = c(NA, -17L))

df使用数据的选项。表
包：
years <- DT[, {
    m <- min(year)
    ty <- tabulate(year-m)
    .(year=which(ty==uniqueN(code)) + m)
}, gcode]

DT[years, on=.(gcode, year),
    by=.EACHI, .(P=sum(P), Q=sum(Q))]

数据：
库（data.table）
DT使用数据的选项。表包：
years <- DT[, {
    m <- min(year)
    ty <- tabulate(year-m)
    .(year=which(ty==uniqueN(code)) + m)
}, gcode]

DT[years, on=.(gcode, year),
    by=.EACHI, .(P=sum(P), Q=sum(Q))]

数据：
库（data.table）
DT我提出了以下解决方案。首先，我计算了每个gcode
每年出现的次数。我还计算了每个gcode
存在多少唯一代码。然后，使用left\u join（）
将两个结果连接起来。然后，我确定了在n_year
和n_code
中具有相同值的行。然后，我加入了原始数据帧，它被称为mydf
。然后，我根据gcode
和year
定义了各组，并对各组的P
和Q
进行了总结
library(dplyr)

left_join(count(mydf, gcode, year, name = "n_year"),
          group_by(mydf, gcode) %>% summarize(n_code = n_distinct(code))) %>% 
filter(n_year == n_code) %>% 
left_join(mydf, by = c("gcode", "year")) %>% 
group_by(gcode, year) %>% 
summarize_at(vars(P:Q),
             .funs = list(~sum(.)))

#  gcode  year     P     Q
#  <int> <int> <int> <int>
#1     1  2000     6     6
#2     1  2001     8    10
#3     2  2001     8    10

我提出了以下解决方案。首先，我计算了每个gcode
每年出现的次数。我还计算了每个gcode
存在多少唯一代码。然后，使用left\u join（）
将两个结果连接起来。然后，我确定了在n_year
和n_code
中具有相同值的行。然后，我加入了原始数据帧，它被称为mydf
。然后，我根据gcode
和year
定义了各组，并对各组的P
和Q
进行了总结
library(dplyr)

left_join(count(mydf, gcode, year, name = "n_year"),
          group_by(mydf, gcode) %>% summarize(n_code = n_distinct(code))) %>% 
filter(n_year == n_code) %>% 
left_join(mydf, by = c("gcode", "year")) %>% 
group_by(gcode, year) %>% 
summarize_at(vars(P:Q),
             .funs = list(~sum(.)))

#  gcode  year     P     Q
#  <int> <int> <int> <int>
#1     1  2000     6     6
#2     1  2001     8    10
#3     2  2001     8    10

请删除第一行“1 2000 5”，gcode=1没有2000年的数据，因为code=102没有2000年的数据是的，非常感谢！你知道如何在R中快速完成吗？请将你的问题编辑成你准确期望的内容。对不起，这是我第一次问问题。我犯了个愚蠢的错误，对不起。现在，我想这是清楚的。对于gcode=1，code=101102103，都有2001年的数据；gcode=2，对你们所有人来说也是一样的，我在这里是全新的。现在我对输入和输出进行最后一次更改。非常感谢你的帮助！请删除第一行“1 2000 5”，gcode=1没有2000年的数据，因为code=102没有2000年的数据是的，非常感谢！你知道如何在R中快速完成吗？请将你的问题编辑成你准确期望的内容。对不起，这是我第一次问问题。我犯了个愚蠢的错误，对不起。现在，我想这是清楚的。对于gcode=1，code=101102103，都有2001年的数据；gcode=2，对你们所有人来说也是一样的，我在这里是全新的。现在我对输入和输出进行最后一次更改。非常感谢你的帮助！是 啊我现在得到答案了。再次感谢！还有一个问题，如果我想在excel中提取输出数据，对于这两种方法，我该怎么做？@XUNZHANG你可以将上面的输出存储在一个变量new_文件中，我得到了它。再次感谢！是 啊我现在得到答案了。再次感谢！还有一个问题，如果我想在excel中提取输出数据，对于这两种方法，我该怎么做？@XUNZHANG你可以将上面的输出存储在一个变量new_文件中，我得到了它。再次感谢！非常感谢您的回复。我会试试你的答案。非常感谢你的回复。我试试你的答案。是的，我和你有同样的逻辑。但我就是不能有输出。非常感谢你！还有一个问题，如果我想在excel中提取输出数据，对于您的方法，我该如何提取？您的意思是想将结果保存为excel文件？很高兴为您提供帮助。：）是的，我和你有同样的逻辑。但我就是不能有输出。非常感谢你！还有一个问题，如果我想在excel中提取输出数据，对于您的方法，我该如何提取？您的意思是想将结果保存为excel文件？很高兴为您提供帮助。：）A.