R data.table中的分组计数聚合
包含日期、买入值和卖出值的表。我想数一数每天有多少买卖,以及买卖的总数。我发现在data.table中这样做有点棘手R data.table中的分组计数聚合,r,count,aggregate,data.table,R,Count,Aggregate,Data.table,包含日期、买入值和卖出值的表。我想数一数每天有多少买卖,以及买卖的总数。我发现在data.table中这样做有点棘手 date buy sell 2011-01-01 1 0 2011-01-02 0 0 2011-01-03 0 2 2011-01-04 3 0 2011-01-05 0 0 2011-01-06 0 0 2011-01-01 0 0 2011-01-02 0 1 2011-01-03 4 0 2011
date buy sell
2011-01-01 1 0
2011-01-02 0 0
2011-01-03 0 2
2011-01-04 3 0
2011-01-05 0 0
2011-01-06 0 0
2011-01-01 0 0
2011-01-02 0 1
2011-01-03 4 0
2011-01-04 0 0
2011-01-05 0 0
2011-01-06 0 0
2011-01-01 0 0
2011-01-02 0 8
2011-01-03 2 0
2011-01-04 0 0
2011-01-05 0 0
2011-01-06 0 5
可以使用以下代码创建上述data.table:
DT = data.table(
date=rep(as.Date('2011-01-01')+0:5,3) ,
buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0),
sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5));
因此,我想要的是:
date total_buys total_sells
2011-01-01 1 0
2011-01-02 0 2
and so on
此外,我还想知道买卖的总数:
total_buys total_sells
4 4
我试过:
length(DT[sell > 0 | buy > 0])
> 3
这是一个奇怪的答案(想知道为什么)除了@Jake的答案外,还有一个典型的
melt
+dcast
例程,类似于:
library(reshape2)
dtL <- melt(DT, id.vars = "date")
dcast.data.table(dtL, date ~ variable, value.var = "value",
fun.aggregate = function(x) sum(x > 0))
# date buy sell
# 1 2011-01-01 1 0
# 2 2011-01-02 0 2
# 3 2011-01-03 2 1
# 4 2011-01-04 1 0
# 5 2011-01-05 0 0
# 6 2011-01-06 0 1
要获取其他表格,请尝试:
dtL[, list(count = sum(value > 0)), by = variable]
# variable count
# 1: buy 4
# 2: sell 4
或者,不熔化:
DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")]
# buy sell
# 1: 4 4
总和加上购买价值-我想数一数。总购买量和总销售量分别为4。谢谢,杰克。你能解释一下这是怎么回事吗?这是一个非常简洁的方法。@user1480926您对哪一部分感到困惑
buy>0
和sell>0
返回一个逻辑
,因此其总和就是非零计数。在数据中使用by
。表
可以让您轻松地按某个变量分组。@user1480926,我想我会分享它,因为如果您有更多的列而不仅仅是2列,这会更方便。
dtL[, list(count = sum(value > 0)), by = variable]
# variable count
# 1: buy 4
# 2: sell 4
DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")]
# buy sell
# 1: 4 4