R 总结结束于当前观察,根据标准开始
我观察了(以下示例中:4)不同客户在(五)天内的购买数量。现在,我想创建一个新变量,将过去20年中每个用户的总购买量加起来 示例数据:R 总结结束于当前观察,根据标准开始,r,dplyr,sum,pipe,cumsum,R,Dplyr,Sum,Pipe,Cumsum,我观察了(以下示例中:4)不同客户在(五)天内的购买数量。现在,我想创建一个新变量,将过去20年中每个用户的总购买量加起来 示例数据: > da <- data.frame(customer_id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4), + day = c("2016-04-11","2016-04-12","2016-04-13",&qu
> da <- data.frame(customer_id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
+ day = c("2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15"),
+ n_purchase = c(5,2,8,0,3,2,0,3,4,0,2,4,5,1,0,2,3,5,0,3))
> da
customer_id day n_purchase
1 1 2016-04-11 5
2 1 2016-04-12 2
3 1 2016-04-13 8
4 1 2016-04-14 0
5 1 2016-04-15 3
6 2 2016-04-11 2
7 2 2016-04-12 0
8 2 2016-04-13 3
9 2 2016-04-14 4
10 2 2016-04-15 0
11 3 2016-04-11 2
12 3 2016-04-12 4
13 3 2016-04-13 5
14 3 2016-04-14 1
15 3 2016-04-15 0
16 4 2016-04-11 2
17 4 2016-04-12 3
18 4 2016-04-13 5
19 4 2016-04-14 0
20 4 2016-04-15 3
>da da
客户id日n\u购买
1 1 2016-04-11 5
2 1 2016-04-12 2
3 1 2016-04-13 8
4 1 2016-04-14 0
5 1 2016-04-15 3
6 2 2016-04-11 2
7 2 2016-04-12 0
8 2 2016-04-13 3
9 2 2016-04-14 4
10 2 2016-04-15 0
11 3 2016-04-11 2
12 3 2016-04-12 4
13 3 2016-04-13 5
14 3 2016-04-14 1
15 3 2016-04-15 0
16 4 2016-04-11 2
17 4 2016-04-12 3
18 4 2016-04-13 5
19 4 2016-04-14 0
20 4 2016-04-15 3
我需要知道三件事来构造变量:
(1) 用户每天的总购买量(日购买量)是多少?
(2) 从第一天开始,用户的累计购买量是多少(cumsum\u day\u购买)?
(3) 根据目前的观察,前20次(跨用户)购买是在哪一天开始的?这就是我在编码这样一个变量时遇到的问题
> library(dplyr)
> da %>%
+ group_by(day) %>%
+ mutate(day_purchases = sum(n_purchase)) %>%
+ group_by(customer_id) %>%
+ mutate(cumsum_day_purchases = cumsum(day_purchases))
# A tibble: 20 x 5
# Groups: customer_id [4]
customer_id day n_purchase day_purchases cumsum_day_purchases
<dbl> <fct> <dbl> <dbl> <dbl>
1 1 2016-04-11 5 11 11
2 1 2016-04-12 2 9 20
3 1 2016-04-13 8 21 41
4 1 2016-04-14 0 5 46
5 1 2016-04-15 3 6 52
6 2 2016-04-11 2 11 11
7 2 2016-04-12 0 9 20
8 2 2016-04-13 3 21 41
9 2 2016-04-14 4 5 46
10 2 2016-04-15 0 6 52
11 3 2016-04-11 2 11 11
12 3 2016-04-12 4 9 20
13 3 2016-04-13 5 21 41
14 3 2016-04-14 1 5 46
15 3 2016-04-15 0 6 52
16 4 2016-04-11 2 11 11
17 4 2016-04-12 3 9 20
18 4 2016-04-13 5 21 41
19 4 2016-04-14 0 5 46
20 4 2016-04-15 3 6 52
>库(dplyr)
>da%>%
+分组单位(天)%>%
+变动(日购买=金额(日购买))%>%
+分组依据(客户id)%>%
+变异(累计金额/日购买量=累计金额(日购买量))
#一个tibble:20x5
#组别:客户识别码[4]
客户id日n日购买日购买金额日购买金额
1 1 2016-04-11 5 11 11
2 1 2016-04-12 2 9 20
3 1 2016-04-13 8 21 41
4 1 2016-04-14 0 5 46
5 1 2016-04-15 3 6 52
6 2 2016-04-11 2 11 11
7 2 2016-04-12 0 9 20
8 2 2016-04-13 3 21 41
9 2 2016-04-14 4 5 46
10 2 2016-04-15 0 6 52
11 3 2016-04-11 2 11 11
12 3 2016-04-12 4 9 20
13 3 2016-04-13 5 21 41
14 3 2016-04-14 1 5 46
15 3 2016-04-15 0 6 52
16 4 2016-04-11 2 11 11
17 4 2016-04-12 3 9 20
18 4 2016-04-13 5 21 41
19 4 2016-04-14 0 5 46
20 4 2016-04-15 3 6 52
现在,我将在下面的数据集中计算我希望手动使用的变量
- 对于2016-04-12日的所有观察结果,我计算累积和 通过添加购买数量来计算特定客户的购买数量 因为总的来说 客户在当天和当天共购买了20件商品 前一天
- 对于2016-04-13天,我只使用用户在 这一天,因为当天有21(41-20)个新的购买
> da = da %>% ungroup() %>%
+ mutate(cumsum_last_20_purchases = c(5,5+2,8,0,0+3,2,2+0,3,4,4+0,2,2+4,5,1,1+0,2,2+3,5,0,0+3))
> da
# A tibble: 20 x 6
customer_id day n_purchase day_purchases cumsum_day_purchases cumsum_last_20_purchases
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 1 2016-04-11 5 11 11 5
2 1 2016-04-12 2 9 20 7
3 1 2016-04-13 8 21 41 8
4 1 2016-04-14 0 5 46 0
5 1 2016-04-15 3 6 52 3
6 2 2016-04-11 2 11 11 2
7 2 2016-04-12 0 9 20 2
8 2 2016-04-13 3 21 41 3
9 2 2016-04-14 4 5 46 4
10 2 2016-04-15 0 6 52 4
11 3 2016-04-11 2 11 11 2
12 3 2016-04-12 4 9 20 6
13 3 2016-04-13 5 21 41 5
14 3 2016-04-14 1 5 46 1
15 3 2016-04-15 0 6 52 1
16 4 2016-04-11 2 11 11 2
17 4 2016-04-12 3 9 20 5
18 4 2016-04-13 5 21 41 5
19 4 2016-04-14 0 5 46 0
20 4 2016-04-15 3 6 52 3
>da=da%%>%ungroup()%%>%
+变异(cumsum_last_20_purchases=c(5,5+2,8,0,0+3,2,2+0,3,4,4+0,2,2+4,5,1,1+0,2,2+3,5,0,0+3))
>da
#一个tibble:20x6
客户id日n日购买日购买金额日购买金额日购买金额日最后20日购买金额
1 1 2016-04-11 5 11 11 5
2 1 2016-04-12 2 9 20 7
3 1 2016-04-13 8 21 41 8
4 1 2016-04-14 0 5 46 0
5 1 2016-04-15 3 6 52 3
6 2 2016-04-11 2 11 11 2
7 2 2016-04-12 0 9 20 2
8 2 2016-04-13 3 21 41 3
9 2 2016-04-14 4 5 46 4
10 2 2016-04-15 0 6 52 4
11 3 2016-04-11 2 11 11 2
12 3 2016-04-12 4 9 20 6
13 3 2016-04-13 5 21 41 5
14 3 2016-04-14 1 5 46 1
15 3 2016-04-15 0 6 52 1
16 4 2016-04-11 2 11 11 2
17 4 2016-04-12 3 9 20 5
18 4 2016-04-13