R中跨数据帧的条件求和

R中跨数据帧的条件求和,r,data.table,aggregate,R,Data.table,Aggregate,我试图在R中复制SUMIFS功能。我有两个数据帧 数据帧1 allReported ID employeeGroup 1093 Bargaining Unit 1093 Management 1093 Non-Union 55 Bargaining Unit 55 Management 55 Non-Union 数据帧2 employeeCompSummary ID employeeGroup s

我试图在R中复制SUMIFS功能。我有两个数据帧

数据帧1

allReported

ID       employeeGroup
1093     Bargaining Unit
1093     Management
1093     Non-Union
55       Bargaining Unit
55       Management
55       Non-Union
数据帧2

employeeCompSummary

ID       employeeGroup      statBenefits    regularWages
1093     Management         500.00          10000.00
1093     Management         200.00          60000.00
1093     Bargaining Unit    100.00          20000.00
1093     Bargaining Unit    150.00          30000.00
1093     Non-Union          500.00          60000.00
55       Bargaining Unit    750.00          65000.00
55       Bargaining Unit    500.00          75000.00
55       Management         250.00          45000.00
55       Management         850.00          90000.00
我试图将固定收益和以后的固定工资相加,以创建一个新表,该表将产生以下结果:

ID       employeeGroup          statBenefits
1093     Bargaining Unit        250.00
1093     Management             700.00
1093     Non-Union              500.00
55       Bargaining Unit        1250.00
55       Management             1100.00
55       Non-Union              0.00
我尝试了以下方法:

library(data.table)
setDT(allReported)[, list(total=sum(statbenefits)), list(employeeCompSummary, employeeGroup)]
并获取以下错误:

Error in `[.data.table`(setDT(allReported), , list(total = sum(statbenefits)),  :   column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]
我还尝试:

sumTest <- aggregate(allReported, by = list(employeeCompSummary), sum)
任何人能提供的任何帮助都将不胜感激。我看了其他一些问题,这些问题似乎可以解决这个问题,但没有找到一个有效的答案。我将在多个方面完成这项任务,所以我想知道是否有一种任何人都知道的简单技术。一如既往,提前感谢Stack Overflow上的精彩社区

编辑两个示例表的dput:

allReported <- structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame"))

employeeCompSummary <- structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

 . 

根据您的评论进行编辑:一种方法是这样使用data.table

library(data.table)
dt1 <- data.table(structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), 
               employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), 
          row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))

dt2 <- data.table(structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), 
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))



dt1[dt2][, lapply(.SD, sum), .SDcols = c("statBenefits", "regularWages"), by = c("ID", "employeeGroup")]

以后可以根据您的注释将NA值替换为0

编辑:一种方法是这样使用data.table

library(data.table)
dt1 <- data.table(structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), 
               employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), 
          row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))

dt2 <- data.table(structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), 
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))



dt1[dt2][, lapply(.SD, sum), .SDcols = c("statBenefits", "regularWages"), by = c("ID", "employeeGroup")]
您可以稍后将NA值替换为0

您可以对%>%软件包使用dplyr和magrittr来执行此操作-

library(dplyr)
library(magrittr)

df1 <- structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

df2 <- structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame"))

result <- left_join(df1, df2, by = c("ID", "employeeGroup")) %>%
  group_by(ID, employeeGroup) %>%
  summarize(
    statBenefits = sum(statBenefits, na.rm = T),
    regularWages = sum(regularWages, na.rm = T)
  )
result
您可以对%>%包使用dplyr和magrittr来执行此操作-

library(dplyr)
library(magrittr)

df1 <- structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

df2 <- structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame"))

result <- left_join(df1, df2, by = c("ID", "employeeGroup")) %>%
  group_by(ID, employeeGroup) %>%
  summarize(
    statBenefits = sum(statBenefits, na.rm = T),
    regularWages = sum(regularWages, na.rm = T)
  )
result
我会

library(data.table)

# don't use setDT, since who knows if it works on tibbeldies
ar = data.table(allReported)
ecs = data.table(employeeCompSummary)

ecs[, total := ar[.SD, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI][, V1]]

     ID   employeeGroup total
1: 1093 Bargaining Unit   250
2: 1093      Management   700
3: 1093       Non-Union   500
4:   55 Bargaining Unit  1250
5:   55      Management  1100
6:   55       Non-Union    NA
此代码向ecs添加列,即使OP请求了一个新表。新表和ecs之间的行集是相同的,因此将这两个行都带在身边似乎是在浪费精力。稍后删除列很简单

如果您想知道此更新联接的工作方式,请尝试反向工作

ar[ecs, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI]

# or

ar[ecs, on=.(ID, employeeGroup)]
注意:原始代码中的SD==ecs。请看?.SD.

我想

library(data.table)

# don't use setDT, since who knows if it works on tibbeldies
ar = data.table(allReported)
ecs = data.table(employeeCompSummary)

ecs[, total := ar[.SD, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI][, V1]]

     ID   employeeGroup total
1: 1093 Bargaining Unit   250
2: 1093      Management   700
3: 1093       Non-Union   500
4:   55 Bargaining Unit  1250
5:   55      Management  1100
6:   55       Non-Union    NA
此代码向ecs添加列,即使OP请求了一个新表。新表和ecs之间的行集是相同的,因此将这两个行都带在身边似乎是在浪费精力。稍后删除列很简单

如果您想知道此更新联接的工作方式,请尝试反向工作

ar[ecs, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI]

# or

ar[ecs, on=.(ID, employeeGroup)]

注意:原始代码中的SD==ecs。请参阅?.SD.

数据对于工作是保密的,这就是我在上面列出示例表的原因。您能否以dput格式提供这些表?这将使帮助人员更容易工作的数据是保密的,这就是为什么我在上面列出了示例表。您能以dput格式提供这些表吗?这将使助手更容易返回以下错误:error in dt[,lappy.SD,sum,.SDcols=cstatBenefits,regularsawees,:类型为“closure”的对象不是subsetabledt是一个函数try?dt,因此,当您尝试将其子集为data.frame时,会出现该错误。如果您使用与其他R对象不冲突的内容重命名数据,则应使用data.tabl解决该错误e的.EACHI是employeeCompSummary[allReported,.statBenefits=if.N>0 sumstatBenefits else 0,on=.ID,employeeGroup,by=.EACHI]回答得很好。OP现在发布了没有键的数据。您可能希望显示如何分配键和/或使用on=以便不需要键。这将返回以下错误:error in dt[,lappy.SD,sum,.SDcols=cstatBenefits,regularsawees,:类型为“closure”的对象不是subsetabledt是一个函数try?dt,因此,当您尝试将其子集为data.frame时,会出现该错误。如果您使用与其他R对象不冲突的内容重命名数据,则应使用data.tabl解决该错误e的.EACHI是employeeCompSummary[allReported,.statBenefits=if.N>0 sumstatBenefits else 0,on=.ID,employeeGroup,by=.EACHI]回答得很好。OP现在发布了没有键的数据。您可能希望演示如何分配键和/或使用on=以便不需要键。