R 选择单个订单中的最大花费金额
我对R和sqldf非常陌生,似乎无法解决一个基本问题。我有一个包含事务的文件,其中每一行表示购买的产品 该文件如下所示:R 选择单个订单中的最大花费金额,r,sqldf,R,Sqldf,我对R和sqldf非常陌生,似乎无法解决一个基本问题。我有一个包含事务的文件,其中每一行表示购买的产品 该文件如下所示: customer_id,order_number,order_date, amount, product_name 1, 202, 21/04/2015, 58, "xlfd" 1, 275, 16//08/2015, 74, "ghb" 1, 275, 16//08/2015, 36, "fjk" 2, 987, 12/03/2015, 27, "xlgm" 3, 376,
customer_id,order_number,order_date, amount, product_name
1, 202, 21/04/2015, 58, "xlfd"
1, 275, 16//08/2015, 74, "ghb"
1, 275, 16//08/2015, 36, "fjk"
2, 987, 12/03/2015, 27, "xlgm"
3, 376, 16/05/2015, 98, "fgt"
3, 368, 30/07/2015, 46, "ade"
我需要按每个
客户id
查找单笔交易(相同的订单号
)的最大花费金额。例如,如果客户id“1”
为(74+36)=110
,假设数据帧命名为订单
,则以下操作将完成:
sqldf("select customer_id, order_number, sum(amount)
from orders
group by customer_id, order_number")
更新:使用嵌套查询,以下内容将给出所需的输出:
sqldf("select customer_id, max(total)
from (select customer_id, order_number, sum(amount) as total
from orders
group by customer_id, order_number)
group by customer_id")
输出:
customer_id max(total)
1 1 110
2 2 27
3 3 98
如果
sqldf
不是严格的要求
考虑到您的输入为dft,您可以尝试:
require(dplyr)
require(magrittr)
dft %>%
group_by(customer_id, order_number) %>%
summarise(amt = sum(amount)) %>%
group_by(customer_id) %>%
summarise(max_amt = max(amt))
其中:
Source: local data frame [3 x 2]
Groups: customer_id [3]
customer_id max_amt
<int> <int>
1 1 110
2 2 27
3 3 98
来源:本地数据帧[3 x 2]
组别:客户识别码[3]
客户id最大金额
1 1 110
2 2 27
3 3 98
我们也可以使用数据表。将'data.frame'转换为'data.table'(setDT(df1)
),按'customer\u id'、'order\u number'分组,我们得到'amount'的sum
,用'customer\u id'进行第二次分组,得到'summant'的max
library(data.table)
setDT(df1)[, .(Sumamount = sum(amount)) , .(customer_id, order_number)
][,.(MaxAmount = max(Sumamount)) , customer_id]
# customer_id MaxAmount
#1: 1 110
#2: 2 27
#3: 3 98
或者使其更紧凑,在按“客户id”分组后,我们将“金额”按“订单编号”拆分,循环遍历列表,获得总和,找到最大值,获得“最大金额”
setDT(df1)[, .(MaxAmount = max(unlist(lapply(split(amount,
order_number), sum)))), customer_id]
# customer_id MaxAmount
#1: 1 110
#2: 2 27
#3: 3 98
或使用base R
aggregate(amount~customer_id, aggregate(amount~customer_id+order_number,
df1, sum), FUN = max)
这将返回每个用户每次购买所花费的总金额,而所需输出似乎只是用户所有购买中单个购买的最大金额。可能会获取此输出并提取customer\u id,max(sum(amount))
和group by customer\u id
?@Elena Berrone,请接受答案,请参阅