R 聚合多个函数

R 聚合多个函数,r,aggregate,R,Aggregate,有可能从以下数据帧df1 Branch Loan_Amount TAT A 100 2.0 A 120 4.0 A 300 9.0 B 150 1.5 B 200 2.0 我可以使用聚合函数作为数据帧df2获得以下输出 Branch Number_of_loans Loan_Amount Total_TAT A

有可能从以下数据帧df1

 Branch Loan_Amount TAT
      A         100 2.0
      A         120 4.0
      A         300 9.0
      B         150 1.5
      B         200 2.0
我可以使用聚合函数作为数据帧df2获得以下输出

 Branch Number_of_loans Loan_Amount Total_TAT
      A               3         520      15.0
      B               2         350       3.5

我知道我可以使用nrow计算贷款和合并的数量,但我正在寻找一种更好的方法。

使用dplyr,您可以这样做:

library(dplyr)
group_by(d,Branch) %>% 
  summarize(Number_of_loans = n(),
            Loan_Amount = sum(Loan_Amount),
            TAT = sum(TAT))
输出

Source: local data frame [2 x 4]

  Branch Number_of_loans Loan_Amount   TAT
  (fctr)           (int)       (int) (dbl)
1      A               3         520  15.0
2      B               2         350   3.5
资料

d
基本包:

sqldf

输出

  Branch Loan_Amount  TAT Number_of_loans
1      A         520 15.0               3
2      B         350  3.5               2
  Branch Number_of_loans Loan_Amount  TAT
1      A               3         520 15.0
2      B               2         350  3.5
数据

df <- structure(list(Branch = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), Loan_Amount = c(100L, 120L, 300L, 150L, 
200L), TAT = c(2, 4, 9, 1.5, 2)), .Names = c("Branch", "Loan_Amount", 
"TAT"), class = "data.frame", row.names = c(NA, -5L))
df使用data.table

library(data.table)
setDT(df)[,list(Number_of_loans=.N, 
                Loan_Amount    =sum(Loan_Amount), 
                Total_TAT      =sum(TAT)), by=Branch]
#    Branch Number_of_loans Loan_Amount Total_TAT
# 1:      A               3         520      15.0
# 2:      B               2         350       3.5

这是一个粗糙且低效的方法,但它可以工作并且很有趣(它使用了
aggregate()
):


为什么
aggregate
对您来说不是一个好方法?
  Branch Number_of_loans Loan_Amount  TAT
1      A               3         520 15.0
2      B               2         350  3.5
df <- structure(list(Branch = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), Loan_Amount = c(100L, 120L, 300L, 150L, 
200L), TAT = c(2, 4, 9, 1.5, 2)), .Names = c("Branch", "Loan_Amount", 
"TAT"), class = "data.frame", row.names = c(NA, -5L))
library(data.table)
setDT(df)[,list(Number_of_loans=.N, 
                Loan_Amount    =sum(Loan_Amount), 
                Total_TAT      =sum(TAT)), by=Branch]
#    Branch Number_of_loans Loan_Amount Total_TAT
# 1:      A               3         520      15.0
# 2:      B               2         350       3.5
d <- read.table(text="Branch Loan_Amount TAT
A         100 2.0
A         120 4.0
A         300 9.0
B         150 1.5
B         200 2.0",head=TRUE)

library(stringr)
df = aggregate(.~Branch, data=d, FUN=function(x) paste0(length(x), '|',sum(x)))
df_ = cbind(str_split_fixed(df$Loan_Amount, '|', 4)[,c(2,4)], str_split_fixed(df$TAT, '|', 4)[,4])
df_ = apply(df_, 2, as.numeric)
colnames(df_) = c('Number_of_loans','Loan_Amount','Total_TAT')
cbind(df[,'Branch',drop=F], df_)
  Branch Number_of_loans Loan_Amount Total_TAT
1      A               3         520      15.0
2      B               2         350       3.5