Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/82.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于分组变量的更改,使用dplyr生成订单列_R_Dplyr - Fatal编程技术网

基于分组变量的更改,使用dplyr生成订单列

基于分组变量的更改,使用dplyr生成订单列,r,dplyr,R,Dplyr,在生成排名列方面,我对dplyr有一点挑战 从特定使用者的事务日志中删除tbl_df对象。我得到的数据如下所示: consumerid merchant_id eventtimestamp merchant_visit_rank (chr) (int) (time)

在生成排名列方面,我对dplyr有一点挑战 从特定使用者的事务日志中删除tbl_df对象。我得到的数据如下所示:

                                        consumerid merchant_id      eventtimestamp merchant_visit_rank
                                              (chr)       (int)              (time)          (dbl)
            1  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-15 13:33:00              0
            2  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:03              1
            3  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:41              0
            4  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:05              1
            5  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:55              1
            6  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 14:15:56              0
            7  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:18              1
            8  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:19              0
            9  004a5cc3-3d60-4d14-85b3-706e454aae13          54 2015-01-21 13:52:24              0
            10 004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:29              0
            ..                                  ...         ...                 ...            ...
我想生成一个商户访问排名,以便它告诉我该商户在此交易期间的订单 一场在我们的案例中,正确的排名如下所示:

                                        consumerid merchant_id      eventtimestamp merchant_visit_rank
                                              (chr)       (int)              (time)          (dbl)
            1  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-15 13:33:00              1
            2  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:03              2
            3  004a5cc3-3d60-4d14-85b3-706e454aae13          56 2015-01-16 13:58:41              2
            4  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:05              3
            5  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 13:59:55              3
            6  004a5cc3-3d60-4d14-85b3-706e454aae13          52 2015-01-16 14:15:56              3
            7  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:18              4
            8  004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:19              4
            9  004a5cc3-3d60-4d14-85b3-706e454aae13          54 2015-01-21 13:52:24              5
            10 004a5cc3-3d60-4d14-85b3-706e454aae13          58 2015-01-21 13:52:29              6
            ..                                  ...         ...                 ...            ...
我尝试过在dplyr中使用窗口函数,如下所示:

            measure_media_interaction %>% 
              #selecting the fields we wish from the dataframe
              select(consumerid,merchant_id,eventtimestamp) %>%
              #mutate a placeholder column to be used for the rank 
              mutate(merchant_visit = 0) %>% 
              #sort them by consumer and timestamp
              arrange(consumerid,eventtimestamp) %>%
              #change the column so it shows that this merchant was the first this consumer visited 
              #or not 
              mutate(merchant_visit = 
                       ifelse(lead(merchant_id)!=merchant_id,merchant_visit,merchant_visit+1))

然而,我被卡住了,我不知道如何有效地做到这一点。有什么想法吗

这里有一个解决方案。我们使用
lag
测试商户id是否更改,并使用
cumsum
增加计数器

measure_media_interaction %>% 
  select(consumerid,merchant_id,eventtimestamp) %>%
  arrange(consumerid,eventtimestamp) %>%
  mutate(merchant_visit=cumsum(c(1,(merchant_id != lag(merchant_id))[-1])))