R中的队列数据转换

R中的队列数据转换,r,R,我有数据框df,上面有客户名称、加入日期、到期日和队列 names dj exp cohort (fctr) (date) (date) (chr) 1 Tom 2011-05-01 2011-06-22 2011-05 2 David 2011-06-01 2011-07-19 2011-06 3 Jack 2011-05-03 2012-01-03 2011-05 > names<-c("Tom","David

我有数据框df,上面有客户名称、加入日期、到期日和队列

names         dj        exp  cohort
  (fctr)     (date)     (date)   (chr)
1    Tom 2011-05-01 2011-06-22 2011-05
2  David 2011-06-01 2011-07-19 2011-06
3   Jack 2011-05-03 2012-01-03 2011-05
>

names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
tbl_df(df)
下面是我编写的代码,不幸的是,它无法将到期日期和dj与日期列中的日历日期进行比较。例如,David在X1中应该为false。那么,我该怎么做呢

输出错误

  names<-c("Tom","David","Jack")
    dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
    exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
    df<-data.frame(names,dj,exp)
    df$cohort<-format(df$dj,"%Y-%m")


    DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")

DateColumnvalues <- t(sapply(df$exp, function(x) x > DateColumns))
df2 <- data.frame(df,DateColumnvalues)
tbl_df(df2)

output:
names         dj        exp  cohort    X1    X2    X3    X4    X5    X6
  (fctr)     (date)     (date)   (chr) (lgl) (lgl) (lgl) (lgl) (lgl) (lgl)
1    Tom 2011-05-01 2011-06-22 2011-05  TRUE  TRUE FALSE FALSE FALSE FALSE
2  David 2011-06-01 2011-07-19 2011-06  **TRUE**  TRUE  TRUE FALSE FALSE FALSE
3   Jack 2011-05-03 2012-01-03 2011-05  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
Variables not shown: X7 (lgl), X8 (lgl), X9 (lgl), X10 (lgl), X11 (lgl),
  X12 (lgl), X13 (lgl), X14 (lgl), X15 (lgl), X16 (lgl), X17 (lgl), X18
  (lgl), X19 (lgl), X20 (lgl), X21 (lgl), X22 (lgl), X23 (lgl), X24 (lgl),
  X25 (lgl), X26 (lgl), X27 (lgl), X28 (lgl), X29 (lgl), X30 (lgl), X31
  (lgl), X32 (lgl), X33 (lgl), X34 (lgl), X35 (lgl), X36 (lgl), X37 (lgl),
  X38 (lgl), X39 (lgl), X40 (lgl), X41 (lgl), X42 (lgl), X43 (lgl), X44
  (lgl), X45 (lgl), X46 (lgl), X47 (lgl), X48 (lgl), X49 (lgl), X50 (lgl),
  X51 (lgl), X52 (lgl), X53 (lgl), X54 (lgl), X55 (lgl), X56 (lgl)
> 

作为对第一个问题的回答,你可以

library(data.table)
library(lubridate)
dt <- data.table(df, key=c("dj", "exp"))
dates <- setDT(transform(data.frame(start = seq.Date(as.Date("2011-05-01"), as.Date("2011-08-01"), "1 month")), 
                                    end = start + months(1) - 1), 
               key = c("start", "end"))
dcast(foverlaps(dt, dates)[, val:=TRUE], names+dj+exp+cohort~start, value.var="val", fill=FALSE)
#    names         dj        exp  cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01
# 1: David 2011-06-01 2011-07-19 2011-06      FALSE       TRUE       TRUE      FALSE
# 2:  Jack 2011-05-03 2012-01-03 2011-05       TRUE       TRUE       TRUE       TRUE
# 3:   Tom 2011-05-01 2011-06-22 2011-05       TRUE       TRUE      FALSE      FALSE
库(data.table)
图书馆(lubridate)

嗨,卢克,谢谢。第二个查询是关于聚合每个加入月数的人。假设M0是加入的月份,M1是加入后的1个月,m2是加入后的第二个月。此外,if不太适合进行队列分析,因为它没有涵盖月范围内的每个月。。。。不管怎样,您是否查找了
res
?您可能需要更改
n
Names dj exp cohort                     M0 M1 M2 M3  till M55
Dick  2015-01-11 2015-12-10 2015-01     T  T  T  T
Tom   2011-05-01 2011-06-22 2011-05     T  T  F  F
David 2011-06-01 2011-07-19 2011-06     T  T  F  F
library(data.table)
library(lubridate)
dt <- data.table(df, key=c("dj", "exp"))
dates <- setDT(transform(data.frame(start = seq.Date(as.Date("2011-05-01"), as.Date("2011-08-01"), "1 month")), 
                                    end = start + months(1) - 1), 
               key = c("start", "end"))
dcast(foverlaps(dt, dates)[, val:=TRUE], names+dj+exp+cohort~start, value.var="val", fill=FALSE)
#    names         dj        exp  cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01
# 1: David 2011-06-01 2011-07-19 2011-06      FALSE       TRUE       TRUE      FALSE
# 2:  Jack 2011-05-03 2012-01-03 2011-05       TRUE       TRUE       TRUE       TRUE
# 3:   Tom 2011-05-01 2011-06-22 2011-05       TRUE       TRUE      FALSE      FALSE
lst <- apply(df[2:3], 1, function(x) { x <- as.Date(x); as.logical(seq_along(seq(x[1], x[2], by="month")))  }) 
n <- max(lengths(lst)) 
res <- cbind(df, do.call(rbind, lapply(lst, function(x) `length<-`(x, n) ))) 
res[is.na(res)] <- FALSE; res
#   names         dj        exp  cohort    1    2     3     4     5     6     7     8     9
# 1   Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 2 David 2011-06-01 2011-07-19 2011-06 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 3  Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE