R中的队列数据转换
我有数据框df,上面有客户名称、加入日期、到期日和队列R中的队列数据转换,r,R,我有数据框df,上面有客户名称、加入日期、到期日和队列 names dj exp cohort (fctr) (date) (date) (chr) 1 Tom 2011-05-01 2011-06-22 2011-05 2 David 2011-06-01 2011-07-19 2011-06 3 Jack 2011-05-03 2012-01-03 2011-05 > names<-c("Tom","David
names dj exp cohort
(fctr) (date) (date) (chr)
1 Tom 2011-05-01 2011-06-22 2011-05
2 David 2011-06-01 2011-07-19 2011-06
3 Jack 2011-05-03 2012-01-03 2011-05
>
names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
tbl_df(df)
下面是我编写的代码,不幸的是,它无法将到期日期和dj与日期列中的日历日期进行比较。例如,David在X1中应该为false。那么,我该怎么做呢
输出错误
names<-c("Tom","David","Jack")
dj<-as.Date(c("2011-05-01","2011-06-01","2011-05-03"))
exp<-as.Date(c("2011-06-22","2011-07-19","2012-01-03"))
df<-data.frame(names,dj,exp)
df$cohort<-format(df$dj,"%Y-%m")
DateColumns <- seq.Date(as.Date("2011/05/01"), as.Date("2015/12/1"), by = "1 month")
DateColumnvalues <- t(sapply(df$exp, function(x) x > DateColumns))
df2 <- data.frame(df,DateColumnvalues)
tbl_df(df2)
output:
names dj exp cohort X1 X2 X3 X4 X5 X6
(fctr) (date) (date) (chr) (lgl) (lgl) (lgl) (lgl) (lgl) (lgl)
1 Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE
2 David 2011-06-01 2011-07-19 2011-06 **TRUE** TRUE TRUE FALSE FALSE FALSE
3 Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE TRUE TRUE
Variables not shown: X7 (lgl), X8 (lgl), X9 (lgl), X10 (lgl), X11 (lgl),
X12 (lgl), X13 (lgl), X14 (lgl), X15 (lgl), X16 (lgl), X17 (lgl), X18
(lgl), X19 (lgl), X20 (lgl), X21 (lgl), X22 (lgl), X23 (lgl), X24 (lgl),
X25 (lgl), X26 (lgl), X27 (lgl), X28 (lgl), X29 (lgl), X30 (lgl), X31
(lgl), X32 (lgl), X33 (lgl), X34 (lgl), X35 (lgl), X36 (lgl), X37 (lgl),
X38 (lgl), X39 (lgl), X40 (lgl), X41 (lgl), X42 (lgl), X43 (lgl), X44
(lgl), X45 (lgl), X46 (lgl), X47 (lgl), X48 (lgl), X49 (lgl), X50 (lgl),
X51 (lgl), X52 (lgl), X53 (lgl), X54 (lgl), X55 (lgl), X56 (lgl)
>
作为对第一个问题的回答,你可以
library(data.table)
library(lubridate)
dt <- data.table(df, key=c("dj", "exp"))
dates <- setDT(transform(data.frame(start = seq.Date(as.Date("2011-05-01"), as.Date("2011-08-01"), "1 month")),
end = start + months(1) - 1),
key = c("start", "end"))
dcast(foverlaps(dt, dates)[, val:=TRUE], names+dj+exp+cohort~start, value.var="val", fill=FALSE)
# names dj exp cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01
# 1: David 2011-06-01 2011-07-19 2011-06 FALSE TRUE TRUE FALSE
# 2: Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE
# 3: Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE
库(data.table)
图书馆(lubridate)
嗨,卢克,谢谢。第二个查询是关于聚合每个加入月数的人。假设M0是加入的月份,M1是加入后的1个月,m2是加入后的第二个月。此外,if不太适合进行队列分析,因为它没有涵盖月范围内的每个月。。。。不管怎样,您是否查找了res
?您可能需要更改n
Names dj exp cohort M0 M1 M2 M3 till M55
Dick 2015-01-11 2015-12-10 2015-01 T T T T
Tom 2011-05-01 2011-06-22 2011-05 T T F F
David 2011-06-01 2011-07-19 2011-06 T T F F
library(data.table)
library(lubridate)
dt <- data.table(df, key=c("dj", "exp"))
dates <- setDT(transform(data.frame(start = seq.Date(as.Date("2011-05-01"), as.Date("2011-08-01"), "1 month")),
end = start + months(1) - 1),
key = c("start", "end"))
dcast(foverlaps(dt, dates)[, val:=TRUE], names+dj+exp+cohort~start, value.var="val", fill=FALSE)
# names dj exp cohort 2011-05-01 2011-06-01 2011-07-01 2011-08-01
# 1: David 2011-06-01 2011-07-19 2011-06 FALSE TRUE TRUE FALSE
# 2: Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE
# 3: Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE
lst <- apply(df[2:3], 1, function(x) { x <- as.Date(x); as.logical(seq_along(seq(x[1], x[2], by="month"))) })
n <- max(lengths(lst))
res <- cbind(df, do.call(rbind, lapply(lst, function(x) `length<-`(x, n) )))
res[is.na(res)] <- FALSE; res
# names dj exp cohort 1 2 3 4 5 6 7 8 9
# 1 Tom 2011-05-01 2011-06-22 2011-05 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 2 David 2011-06-01 2011-07-19 2011-06 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 3 Jack 2011-05-03 2012-01-03 2011-05 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE