Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/78.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R根据日期识别新用户ID_R_Loops_Date_Tapply - Fatal编程技术网

R根据日期识别新用户ID

R根据日期识别新用户ID,r,loops,date,tapply,R,Loops,Date,Tapply,我有一个两年的用户短信数据集——2015年和2016年(135000条)。我正在尝试确定2016年2月该计划的新用户(基于订户id和实体=“在线订阅”) 问题在于,新用户是指过去12个月内数据中未出现订户id的用户。例如,如果我有以下示例数据: created subscriber_id cellnum entity message msgtxt 2015-21-01 14:03:00 15855 7788826943

我有一个两年的用户短信数据集——2015年和2016年(135000条)。我正在尝试确定2016年2月该计划的新用户(基于订户id和实体=“在线订阅”)

问题在于,新用户是指过去12个月内数据中未出现订户id的用户。例如,如果我有以下示例数据:

created              subscriber_id   cellnum   entity          message  msgtxt
2015-21-01 14:03:00   15855        7788826943   tip             100     end
2015-07-12 14:03:00   15839        7788815940   tip             24      tip 24
2015-08-12 14:03:00   15839        7788815940   stop            99      stop
2016-01-01 14:05:00   15800        2508816941   tip             25      tip 25 
2016-02-01 16:05:00   15800        2508816941   tip             26      tip 26 
2016-03-01 14:05:00   15800        2508816941   tip             27      tip 27 
2016-01-02 14:03:00   15855        7788826943  subscribe-online  1      msg 1
2016-01-02 14:03:00   15839        7788815940  subscribe-online  1      msg 1
15855和15839都在2月1日认购。我希望能够将15855分配为一个新用户,因为最近一次出现的用户_id 15855发生在2015年1月21日——超过12个月。我想将15839指定为重复用户,因为他们最后一次出现在2015年12月8日(不到12个月)


创建(日期)字段的格式为POSIXct。我一直在努力理解loops、sapply和tapply,看看如何在这里使用它。任何帮助都将不胜感激。谢谢

这里有一个使用dplyr的潜在解决方案

library(dplyr)

df <- data.frame(created = c("2015-21-01 14:03:00","2015-12-07 14:03:00","2015-12-08 14:03:00","2016-01-01 14:05:00","2016-02-01 16:05:00","2016-03-01 14:05:00","2016-01-02 14:03:00","2016-01-02 14:03:00"),
                 subscriber_id = c(15855,15839,15839,15800,15800,15800,15855,15839),
                 cellnum = c(7788826943,7788815940,7788815940,2508816941,2508816941,2508816941,7788826943,7788815940),
                 entity = c("tip","tip","stop","tip","tip","tip","subscribe-online","subscribe-online"),
                 message = c("100","24","99","25","26","27","1","1"),
                 msgtxt = c("end","tip 24","stop","tip 25 ","tip 26 ","tip 27 ","msg 1","msg 1"),
                 stringsAsFactors = FALSE
                )

df$created <- as.POSIXct(df$created, format = "%Y-%d-%m %H:%M:%S")


df <- df %>%
      arrange(subscriber_id, created) %>%
      group_by(subscriber_id) %>%
      mutate(new_user = if_else(entity != "subscribe-online", NA, if_else(as.numeric(difftime(created, lag(created), units = "days") > 365) == TRUE, TRUE, NA)))
库(dplyr)
df%
变异(新用户=if-else(实体!=“在线订阅”,NA,if-else(如数字形式(difftime(已创建,滞后(已创建),units=“days”)>365)=真,真,NA)))