Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 总结结束于当前观察,根据标准开始_R_Dplyr_Sum_Pipe_Cumsum - Fatal编程技术网

R 总结结束于当前观察,根据标准开始

R 总结结束于当前观察,根据标准开始,r,dplyr,sum,pipe,cumsum,R,Dplyr,Sum,Pipe,Cumsum,我观察了(以下示例中:4)不同客户在(五)天内的购买数量。现在,我想创建一个新变量,将过去20年中每个用户的总购买量加起来 示例数据: > da <- data.frame(customer_id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4), + day = c("2016-04-11","2016-04-12","2016-04-13",&qu

我观察了(以下示例中:4)不同客户在(五)天内的购买数量。现在,我想创建一个新变量,将过去20年中每个用户的总购买量加起来

示例数据:

> da <- data.frame(customer_id = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
+                  day = c("2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15","2016-04-11","2016-04-12","2016-04-13","2016-04-14","2016-04-15"),
+                  n_purchase = c(5,2,8,0,3,2,0,3,4,0,2,4,5,1,0,2,3,5,0,3))
> da
   customer_id        day n_purchase
1            1 2016-04-11          5
2            1 2016-04-12          2
3            1 2016-04-13          8
4            1 2016-04-14          0
5            1 2016-04-15          3
6            2 2016-04-11          2
7            2 2016-04-12          0
8            2 2016-04-13          3
9            2 2016-04-14          4
10           2 2016-04-15          0
11           3 2016-04-11          2
12           3 2016-04-12          4
13           3 2016-04-13          5
14           3 2016-04-14          1
15           3 2016-04-15          0
16           4 2016-04-11          2
17           4 2016-04-12          3
18           4 2016-04-13          5
19           4 2016-04-14          0
20           4 2016-04-15          3
>da da
客户id日n\u购买
1            1 2016-04-11          5
2            1 2016-04-12          2
3            1 2016-04-13          8
4            1 2016-04-14          0
5            1 2016-04-15          3
6            2 2016-04-11          2
7            2 2016-04-12          0
8            2 2016-04-13          3
9            2 2016-04-14          4
10           2 2016-04-15          0
11           3 2016-04-11          2
12           3 2016-04-12          4
13           3 2016-04-13          5
14           3 2016-04-14          1
15           3 2016-04-15          0
16           4 2016-04-11          2
17           4 2016-04-12          3
18           4 2016-04-13          5
19           4 2016-04-14          0
20           4 2016-04-15          3
我需要知道三件事来构造变量: (1) 用户每天的总购买量(日购买量)是多少? (2) 从第一天开始,用户的累计购买量是多少(cumsum\u day\u购买)? (3) 根据目前的观察,前20次(跨用户)购买是在哪一天开始的?这就是我在编码这样一个变量时遇到的问题

> library(dplyr)
> da %>% 
+   group_by(day) %>% 
+   mutate(day_purchases = sum(n_purchase)) %>% 
+   group_by(customer_id) %>%
+   mutate(cumsum_day_purchases = cumsum(day_purchases))
# A tibble: 20 x 5
# Groups:   customer_id [4]
   customer_id day        n_purchase day_purchases cumsum_day_purchases
         <dbl> <fct>           <dbl>         <dbl>                <dbl>
 1           1 2016-04-11          5            11                   11
 2           1 2016-04-12          2             9                   20
 3           1 2016-04-13          8            21                   41
 4           1 2016-04-14          0             5                   46
 5           1 2016-04-15          3             6                   52
 6           2 2016-04-11          2            11                   11
 7           2 2016-04-12          0             9                   20
 8           2 2016-04-13          3            21                   41
 9           2 2016-04-14          4             5                   46
10           2 2016-04-15          0             6                   52
11           3 2016-04-11          2            11                   11
12           3 2016-04-12          4             9                   20
13           3 2016-04-13          5            21                   41
14           3 2016-04-14          1             5                   46
15           3 2016-04-15          0             6                   52
16           4 2016-04-11          2            11                   11
17           4 2016-04-12          3             9                   20
18           4 2016-04-13          5            21                   41
19           4 2016-04-14          0             5                   46
20           4 2016-04-15          3             6                   52
>库(dplyr)
>da%>%
+分组单位(天)%>%
+变动(日购买=金额(日购买))%>%
+分组依据(客户id)%>%
+变异(累计金额/日购买量=累计金额(日购买量))
#一个tibble:20x5
#组别:客户识别码[4]
客户id日n日购买日购买金额日购买金额
1           1 2016-04-11          5            11                   11
2           1 2016-04-12          2             9                   20
3           1 2016-04-13          8            21                   41
4           1 2016-04-14          0             5                   46
5           1 2016-04-15          3             6                   52
6           2 2016-04-11          2            11                   11
7           2 2016-04-12          0             9                   20
8           2 2016-04-13          3            21                   41
9           2 2016-04-14          4             5                   46
10           2 2016-04-15          0             6                   52
11           3 2016-04-11          2            11                   11
12           3 2016-04-12          4             9                   20
13           3 2016-04-13          5            21                   41
14           3 2016-04-14          1             5                   46
15           3 2016-04-15          0             6                   52
16           4 2016-04-11          2            11                   11
17           4 2016-04-12          3             9                   20
18           4 2016-04-13          5            21                   41
19           4 2016-04-14          0             5                   46
20           4 2016-04-15          3             6                   52
现在,我将在下面的数据集中计算我希望手动使用的变量

  • 对于2016-04-12日的所有观察结果,我计算累积和 通过添加购买数量来计算特定客户的购买数量 因为总的来说 客户在当天和当天共购买了20件商品 前一天
  • 对于2016-04-13天,我只使用用户在 这一天,因为当天有21(41-20)个新的购买
产生以下输出:

> da = da %>% ungroup() %>%
+   mutate(cumsum_last_20_purchases = c(5,5+2,8,0,0+3,2,2+0,3,4,4+0,2,2+4,5,1,1+0,2,2+3,5,0,0+3))
> da
# A tibble: 20 x 6
   customer_id day        n_purchase day_purchases cumsum_day_purchases cumsum_last_20_purchases
         <dbl> <fct>           <dbl>         <dbl>                <dbl>                    <dbl>
 1           1 2016-04-11          5            11                   11                        5
 2           1 2016-04-12          2             9                   20                        7
 3           1 2016-04-13          8            21                   41                        8
 4           1 2016-04-14          0             5                   46                        0
 5           1 2016-04-15          3             6                   52                        3
 6           2 2016-04-11          2            11                   11                        2
 7           2 2016-04-12          0             9                   20                        2
 8           2 2016-04-13          3            21                   41                        3
 9           2 2016-04-14          4             5                   46                        4
10           2 2016-04-15          0             6                   52                        4
11           3 2016-04-11          2            11                   11                        2
12           3 2016-04-12          4             9                   20                        6
13           3 2016-04-13          5            21                   41                        5
14           3 2016-04-14          1             5                   46                        1
15           3 2016-04-15          0             6                   52                        1
16           4 2016-04-11          2            11                   11                        2
17           4 2016-04-12          3             9                   20                        5
18           4 2016-04-13          5            21                   41                        5
19           4 2016-04-14          0             5                   46                        0
20           4 2016-04-15          3             6                   52                        3
>da=da%%>%ungroup()%%>%
+变异(cumsum_last_20_purchases=c(5,5+2,8,0,0+3,2,2+0,3,4,4+0,2,2+4,5,1,1+0,2,2+3,5,0,0+3))
>da
#一个tibble:20x6
客户id日n日购买日购买金额日购买金额日购买金额日最后20日购买金额
1           1 2016-04-11          5            11                   11                        5
2           1 2016-04-12          2             9                   20                        7
3           1 2016-04-13          8            21                   41                        8
4           1 2016-04-14          0             5                   46                        0
5           1 2016-04-15          3             6                   52                        3
6           2 2016-04-11          2            11                   11                        2
7           2 2016-04-12          0             9                   20                        2
8           2 2016-04-13          3            21                   41                        3
9           2 2016-04-14          4             5                   46                        4
10           2 2016-04-15          0             6                   52                        4
11           3 2016-04-11          2            11                   11                        2
12           3 2016-04-12          4             9                   20                        6
13           3 2016-04-13          5            21                   41                        5
14           3 2016-04-14          1             5                   46                        1
15           3 2016-04-15          0             6                   52                        1
16           4 2016-04-11          2            11                   11                        2
17           4 2016-04-12          3             9                   20                        5
18           4 2016-04-13