如何从data.frame中的列中获取最大值并获取所有记录
我有一个data.frame,希望得到包含给定列的最大值的行如何从data.frame中的列中获取最大值并获取所有记录,r,dataframe,max,R,Dataframe,Max,我有一个data.frame,希望得到包含给定列的最大值的行Total Txn_date Cust_no Acct_no cust_type Credit Debit Total 09DEC2013 17382 601298644 I 1500 0 1500 16DEC2013 17382 601298644 I 500 0 500 17DEC2013 17382 601298644 I
Total
Txn_date Cust_no Acct_no cust_type Credit Debit Total
09DEC2013 17382 601298644 I 1500 0 1500
16DEC2013 17382 601298644 I 500 0 500
17DEC2013 17382 601298644 I 0 60 60
18DEC2013 17382 601298644 I 0 200 200
19DEC2013 17382 601298644 I 1500 0 1500
20DEC2013 17382 601298644 I 0 60 60
20DEC2013 17382 601298644 I 0 103 103
30DEC2013 17382 601298644 I 500 0 500
因此,我编写了一个简单的SQL查询,将使用sqldf()
进行解析,如下所示:
s1<-paste("SELECT Txn_date, Cust_no,Credit,Debit,Total,max(Total) as 'MaxTxnAmt' FROM sample GROUP BY Cust_no")
sample_t1<-sqldf(s1)
如果我使用base-R
函数,则得到如上所示的精确输出:
sample_t1<-do.call(rbind,
lapply(split(sample,sample$Cust_no),
function(data) data[which.max(data$Total),]))
样本数据:
sample <- structure(list(Txn_date = c("09DEC2013", "16DEC2013", "17DEC2013",
"18DEC2013", "19DEC2013", "20DEC2013", "20DEC2013", "30DEC2013"
), Cust_no = c(17382L, 17382L, 17382L, 17382L, 17382L, 17382L,
17382L, 17382L), Acct_no = c("601298644", "601298644", "601298644",
"601298644", "601298644", "601298644", "601298644", "601298644"
), cust_type = c("I", "I", "I", "I", "I", "I", "I", "I"), Credit = c(1500,
500, 0, 0, 1500, 0, 0, 500), Debit = c(0, 0, 60, 200, 0, 60,
103, 0), Total = c(1500, 500, 60, 200, 1500, 60, 103, 500)), .Names = c("Txn_date",
"Cust_no", "Acct_no", "cust_type", "Credit", "Debit", "Total"
), row.names = c(16303L, 29153L, 31174L, 33179L, 35388L, 38750L,
38751L, 53052L), class = "data.frame")
sample试试看
或
或
或使用base R
sample[with(sample, ave(Total, Cust_no, FUN=max)==Total),]
# Txn_date Cust_no Acct_no cust_type Credit Debit Total
#1 09DEC2013 17382 601298644 I 1500 0 1500
#5 19DEC2013 17382 601298644 I 1500 0 1500
1)相关子查询尝试以下操作:
sqldf("select *
from sample a
where Total = (select max(Total)
from sample b
where b.Cust_no = a.Cust_no)")
给予:
Txn_date Cust_no Acct_no cust_type Credit Debit Total
1 09DEC2013 17382 601298644 I 1500 0 1500
2 19DEC2013 17382 601298644 I 1500 0 1500
Txn_date Cust_no Acct_no cust_type Credit Debit Total MaxTxnAmt
1 09DEC2013 17382 601298644 I 1500 0 1500 1500
2 19DEC2013 17382 601298644 I 1500 0 1500 1500
2)使用子查询联接或此:
sqldf("select *
from sample
join (select Cust_no, max(Total) as 'MaxTxnAmt'
from sample
group by Cust_no)
using(Cust_no)
where Total = MaxTxnAmt")
给予:
Txn_date Cust_no Acct_no cust_type Credit Debit Total
1 09DEC2013 17382 601298644 I 1500 0 1500
2 19DEC2013 17382 601298644 I 1500 0 1500
Txn_date Cust_no Acct_no cust_type Credit Debit Total MaxTxnAmt
1 09DEC2013 17382 601298644 I 1500 0 1500 1500
2 19DEC2013 17382 601298644 I 1500 0 1500 1500
Txn_date Cust_no Acct_no cust_type Credit Debit Total
1 09DEC2013 17382 601298644 I 1500 0 1500
2 19DEC2013 17382 601298644 I 1500 0 1500
sqldf("select *
from sample
join (select Cust_no, max(Total) as 'MaxTxnAmt'
from sample
group by Cust_no)
using(Cust_no)
where Total = MaxTxnAmt")
Txn_date Cust_no Acct_no cust_type Credit Debit Total MaxTxnAmt
1 09DEC2013 17382 601298644 I 1500 0 1500 1500
2 19DEC2013 17382 601298644 I 1500 0 1500 1500