Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用R data.table聚合选定行_R_Aggregate_Data.table - Fatal编程技术网

使用R data.table聚合选定行

使用R data.table聚合选定行,r,aggregate,data.table,R,Aggregate,Data.table,这就是单个客户的客户订单数据: order_no customer_id product amount order_total 23 1 A 100 100 24 1 A 100 300 24 1 B 100 300 24 1 C

这就是单个客户的客户订单数据:

order_no customer_id  product amount  order_total
      23           1        A    100          100 
      24           1        A    100          300
      24           1        B    100          300
      24           1        C    100          300
      25           1        B    100          100
      26           1        A    100          200
      26           1        B    100          200
我想在一个新列中计算每个客户的平均订单量,因此对于这个客户,它将是175=(100+300+100+200)/4:

我尝试过使用这个版本,但没有成功:

customer_stats <- data.table(customer_stats)[, avg_order_size := mean(order_total), by=list(order_no, customer_id)]

customer\u stats这样做可以避免创建
order\u total

customer_stats[ , avg_order_size := sum(amount, na.rm=TRUE) / length(unique(order_no)), by=customer_id]

但是,我对速度有保留。

这一个怎么样,它似乎转换了您的方法,不需要在这里计算订单总数

dat[, sum(amount), by = list(customer_id, order_no)][ ,avg_order := mean(V1), by = customer_id]

我认为关键在于按客户和订单为原始表设置键,按客户和订单求和订单总数,按客户求平均订单总数,然后将其连接回原始表

# Your data (next time, consider putting R-formatted data in the question...):
dt <- data.table(customer_id=1,
                 order_no=c(23,24,24,24,25,26,26),
                 product=c("A","A","B","C","B","A","B"),
                 product_amount=100,
                 key=c("customer_id","order_no")) # 1: key by customer and order

dt
#   customer_id order_no product product_amount
#1:           1       23       A            100
#2:           1       24       A            100
#3:           1       24       B            100
#4:           1       24       C            100
#5:           1       25       B            100
#6:           1       26       A            100
#7:           1       26       B            100

dt[ # 4: join summary back to original
  dt[,list(order_total=sum(product_amount)),by=list(customer_id,order_no)] [ # 2: order total by customer and order
    ,avg_order_size:=mean(order_total),by=list(customer_id)] # 3: add the average of order total by customer
  ]
#   customer_id order_no product product_amount order_total avg_order_size
#1:           1       23       A            100         100            175
#2:           1       24       A            100         300            175
#3:           1       24       B            100         300            175
#4:           1       24       C            100         300            175
#5:           1       25       B            100         100            175
#6:           1       26       A            100         200            175
#7:           1       26       B            100         200            175
<代码>你的数据(下一次,考虑在问题中放置R格式数据)…
dt您是否尝试使用
:=
执行分配,因此无需额外分配
数据。表
@dickoa,如果您同时按
订单号
客户id
分组,则取
100、300的平均值,分别为100200
(最终值相同)。@Arun Yep,你是right@Bryan,我想你的问题已经回答了。如果能一直回答这个问题就好了。很抱歉,你为什么在这里求和?他总结了
100+300+100+200
(按
total
,而不是
amount
)?框架效果!他通过
order\u no
创建了
order\u total
总和
amount
。请参见编辑他的问题,他询问这是否可以在不计算订单总数的情况下完成。
# Your data (next time, consider putting R-formatted data in the question...):
dt <- data.table(customer_id=1,
                 order_no=c(23,24,24,24,25,26,26),
                 product=c("A","A","B","C","B","A","B"),
                 product_amount=100,
                 key=c("customer_id","order_no")) # 1: key by customer and order

dt
#   customer_id order_no product product_amount
#1:           1       23       A            100
#2:           1       24       A            100
#3:           1       24       B            100
#4:           1       24       C            100
#5:           1       25       B            100
#6:           1       26       A            100
#7:           1       26       B            100

dt[ # 4: join summary back to original
  dt[,list(order_total=sum(product_amount)),by=list(customer_id,order_no)] [ # 2: order total by customer and order
    ,avg_order_size:=mean(order_total),by=list(customer_id)] # 3: add the average of order total by customer
  ]
#   customer_id order_no product product_amount order_total avg_order_size
#1:           1       23       A            100         100            175
#2:           1       24       A            100         300            175
#3:           1       24       B            100         300            175
#4:           1       24       C            100         300            175
#5:           1       25       B            100         100            175
#6:           1       26       A            100         200            175
#7:           1       26       B            100         200            175