R中跨数据帧的条件求和_R_Data.table_Aggregate

R中跨数据帧的条件求和

R中跨数据帧的条件求和,r,data.table,aggregate,R,Data.table,Aggregate,我试图在R中复制SUMIFS功能。我有两个数据帧数据帧1 allReported ID employeeGroup 1093 Bargaining Unit 1093 Management 1093 Non-Union 55 Bargaining Unit 55 Management 55 Non-Union 数据帧2 employeeCompSummary ID employeeGroup s

我试图在R中复制SUMIFS功能。我有两个数据帧

数据帧1

allReported

ID       employeeGroup
1093     Bargaining Unit
1093     Management
1093     Non-Union
55       Bargaining Unit
55       Management
55       Non-Union

数据帧2

employeeCompSummary

ID       employeeGroup      statBenefits    regularWages
1093     Management         500.00          10000.00
1093     Management         200.00          60000.00
1093     Bargaining Unit    100.00          20000.00
1093     Bargaining Unit    150.00          30000.00
1093     Non-Union          500.00          60000.00
55       Bargaining Unit    750.00          65000.00
55       Bargaining Unit    500.00          75000.00
55       Management         250.00          45000.00
55       Management         850.00          90000.00

我试图将固定收益和以后的固定工资相加，以创建一个新表，该表将产生以下结果：

ID       employeeGroup          statBenefits
1093     Bargaining Unit        250.00
1093     Management             700.00
1093     Non-Union              500.00
55       Bargaining Unit        1250.00
55       Management             1100.00
55       Non-Union              0.00

我尝试了以下方法：

library(data.table)
setDT(allReported)[, list(total=sum(statbenefits)), list(employeeCompSummary, employeeGroup)]

并获取以下错误：

Error in `[.data.table`(setDT(allReported), , list(total = sum(statbenefits)),  :   column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

我还尝试：

sumTest <- aggregate(allReported, by = list(employeeCompSummary), sum)

任何人能提供的任何帮助都将不胜感激。我看了其他一些问题，这些问题似乎可以解决这个问题，但没有找到一个有效的答案。我将在多个方面完成这项任务，所以我想知道是否有一种任何人都知道的简单技术。一如既往，提前感谢Stack Overflow上的精彩社区

编辑两个示例表的dput：

allReported <- structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame"))

employeeCompSummary <- structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

 .

根据您的评论进行编辑：一种方法是这样使用data.table

library(data.table)
dt1 <- data.table(structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), 
               employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), 
          row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))

dt2 <- data.table(structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), 
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))



dt1[dt2][, lapply(.SD, sum), .SDcols = c("statBenefits", "regularWages"), by = c("ID", "employeeGroup")]

以后可以根据您的注释将NA值替换为0

编辑：一种方法是这样使用data.table

library(data.table)
dt1 <- data.table(structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), 
               employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), 
          row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))

dt2 <- data.table(structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), 
          row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")), key = c("ID", "employeeGroup"))



dt1[dt2][, lapply(.SD, sum), .SDcols = c("statBenefits", "regularWages"), by = c("ID", "employeeGroup")]

您可以稍后将NA值替换为0

您可以对%>%软件包使用dplyr和magrittr来执行此操作-

library(dplyr)
library(magrittr)

df1 <- structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

df2 <- structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame"))

result <- left_join(df1, df2, by = c("ID", "employeeGroup")) %>%
  group_by(ID, employeeGroup) %>%
  summarize(
    statBenefits = sum(statBenefits, na.rm = T),
    regularWages = sum(regularWages, na.rm = T)
  )
result

您可以对%>%包使用dplyr和magrittr来执行此操作-

library(dplyr)
library(magrittr)

df1 <- structure(list(ID = c(1093, 1093, 1093, 55, 55, 55), employeeGroup =c("Bargaining Unit","Management", "Non-Union", "Bargaining Unit", "Management", "Non-Union")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

df2 <- structure(list(ID = c(1093, 1093, 1093, 1093, 1093, 55, 55, 55,55), employeeGroup = c("Management", "Management", "Bargaining Unit","Bargaining Unit", "Non-Union", "Bargaining Unit", "Bargaining Unit","Management", "Management"), statBenefits = c(500, 200, 100,150, 500, 750, 500, 250, 850), regularWages = c(10000, 60000,20000, 30000, 60000, 65000, 75000, 45000, 90000)), row.names = c(NA,-9L), class = c("tbl_df", "tbl", "data.frame"))

result <- left_join(df1, df2, by = c("ID", "employeeGroup")) %>%
  group_by(ID, employeeGroup) %>%
  summarize(
    statBenefits = sum(statBenefits, na.rm = T),
    regularWages = sum(regularWages, na.rm = T)
  )
result

我会

library(data.table)

# don't use setDT, since who knows if it works on tibbeldies
ar = data.table(allReported)
ecs = data.table(employeeCompSummary)

ecs[, total := ar[.SD, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI][, V1]]

     ID   employeeGroup total
1: 1093 Bargaining Unit   250
2: 1093      Management   700
3: 1093       Non-Union   500
4:   55 Bargaining Unit  1250
5:   55      Management  1100
6:   55       Non-Union    NA

此代码向ecs添加列，即使OP请求了一个新表。新表和ecs之间的行集是相同的，因此将这两个行都带在身边似乎是在浪费精力。稍后删除列很简单

如果您想知道此更新联接的工作方式，请尝试反向工作

ar[ecs, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI]

# or

ar[ecs, on=.(ID, employeeGroup)]

注意：原始代码中的SD==ecs。请看？.SD.

我想

library(data.table)

# don't use setDT, since who knows if it works on tibbeldies
ar = data.table(allReported)
ecs = data.table(employeeCompSummary)

ecs[, total := ar[.SD, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI][, V1]]

     ID   employeeGroup total
1: 1093 Bargaining Unit   250
2: 1093      Management   700
3: 1093       Non-Union   500
4:   55 Bargaining Unit  1250
5:   55      Management  1100
6:   55       Non-Union    NA

此代码向ecs添加列，即使OP请求了一个新表。新表和ecs之间的行集是相同的，因此将这两个行都带在身边似乎是在浪费精力。稍后删除列很简单

如果您想知道此更新联接的工作方式，请尝试反向工作

ar[ecs, on=.(ID, employeeGroup), sum(x.statBenefits), by=.EACHI]

# or

ar[ecs, on=.(ID, employeeGroup)]

注意：原始代码中的SD==ecs。请参阅？.SD.

数据对于工作是保密的，这就是我在上面列出示例表的原因。您能否以dput格式提供这些表？这将使帮助人员更容易工作的数据是保密的，这就是为什么我在上面列出了示例表。您能以dput格式提供这些表吗？这将使助手更容易返回以下错误：error in dt[，lappy.SD，sum，.SDcols=cstatBenefits，regularsawees，：类型为“closure”的对象不是subsetabledt是一个函数try？dt，因此，当您尝试将其子集为data.frame时，会出现该错误。如果您使用与其他R对象不冲突的内容重命名数据，则应使用data.tabl解决该错误e的.EACHI是employeeCompSummary[allReported，.statBenefits=if.N>0 sumstatBenefits else 0，on=.ID，employeeGroup，by=.EACHI]回答得很好。OP现在发布了没有键的数据。您可能希望显示如何分配键和/或使用on=以便不需要键。这将返回以下错误：error in dt[，lappy.SD，sum，.SDcols=cstatBenefits，regularsawees，：类型为“closure”的对象不是subsetabledt是一个函数try？dt，因此，当您尝试将其子集为data.frame时，会出现该错误。如果您使用与其他R对象不冲突的内容重命名数据，则应使用data.tabl解决该错误e的.EACHI是employeeCompSummary[allReported，.statBenefits=if.N>0 sumstatBenefits else 0，on=.ID，employeeGroup，by=.EACHI]回答得很好。OP现在发布了没有键的数据。您可能希望演示如何分配键和/或使用on=以便不需要键。