根据R中的条件获取以前的日期
我有如下数据根据R中的条件获取以前的日期,r,R,我有如下数据 df <- data.frame(CustID = c(1,2,3,4,5,1,5), CustName = c("Fred","Maria","John","Mark", "Julia","Fred","Julia"), ServiceDate = c('2010-11-1','2008-3-25','2007-3-14','2010-11-1','2008-3-25','2010-12-14','2008-3-14'), stringsAsFactors = F) df
df <- data.frame(CustID = c(1,2,3,4,5,1,5),
CustName = c("Fred","Maria","John","Mark", "Julia","Fred","Julia"),
ServiceDate = c('2010-11-1','2008-3-25','2007-3-14','2010-11-1','2008-3-25','2010-12-14','2008-3-14'), stringsAsFactors = F)
df$ServiceDate <- as.Date(df$ServiceDate, "%Y-%m-%d")
df
CustID CustName ServiceDate
1 1 Fred 2010-11-01
2 2 Maria 2008-03-25
3 3 John 2007-03-14
4 4 Mark 2010-11-01
5 5 Julia 2008-03-25
6 1 Fred 2010-12-14
7 5 Julia 2008-03-14
df使用dplyr
我认为这应该可以解决您的问题
library(dplyr)
df %>%
group_by(CustID) %>%
arrange(ServiceDate) %>%
mutate(PriorServiceDate = lag(ServiceDate))
Source: local data frame [7 x 4]
Groups: CustID
CustID CustName ServiceDate PriorServiceDate
1 1 Fred 2010-11-01 <NA>
2 1 Fred 2010-12-14 2010-11-01
3 2 Maria 2008-03-25 <NA>
4 3 John 2007-03-14 <NA>
5 4 Mark 2010-11-01 <NA>
6 5 Julia 2008-03-14 <NA>
7 5 Julia 2008-03-25 2008-03-14
与dplyr
答案不同,它使用基数R并获取最小日期而不是延迟
首先获取每位客户的首次服务日期
first.service <- with(df, aggregate(ServiceDate,
by=list(CustID=CustID, CustName=CustName),
FUN=min))
然后重命名该列:
colnames(both)[4] <- "PriorServiceDate"
colnames(两者)[4]使用sqldf,可以通过左自联接完成。对于b
中的特定行,保留a
中CustID
与b
中的CustID
相同且ServiceDate
较少的行。然后在那些a
行中选择ServiceDate
最大的一行。这对输入的顺序没有任何假设。它保留行的原始顺序,但如果这不重要,则可以省略order by
行:
library(sqldf)
DF <- sqldf("select b.CustID,
b.CustName,
b.ServiceDate ServiceDate__Date,
max(a.ServiceDate) PriorDate__Date
from df b
left join df a
on b.ServiceDate > a.ServiceDate and b.CustID = a.CUSTID
group by b.CustID, b.ServiceDate
order by b.rowid",
method = "name__class")
库(sqldf)
DF-DF
CustID CustName ServiceDate PriorDate
1弗雷德2010-11-01
2玛丽亚2008-03-25
3约翰2007-03-14
4马克2010-11-01
5.5 Julia 2008-03-25 2008-03-14
6 1弗雷德2010-12-14 2010-11-01
7.5 Julia 2008-03-14
我认为您的滞后
解决方案很好,因为OP特别要求提前日期,而不是最小日期。亚历克斯只是不必要地把事情复杂化了。
both <- merge(df, first.service, by=c("CustID", "CustName"))
both$x[with(both, ServiceDate == x)] <- NA
colnames(both)[4] <- "PriorServiceDate"
library(sqldf)
DF <- sqldf("select b.CustID,
b.CustName,
b.ServiceDate ServiceDate__Date,
max(a.ServiceDate) PriorDate__Date
from df b
left join df a
on b.ServiceDate > a.ServiceDate and b.CustID = a.CUSTID
group by b.CustID, b.ServiceDate
order by b.rowid",
method = "name__class")
> DF
CustID CustName ServiceDate PriorDate
1 1 Fred 2010-11-01 <NA>
2 2 Maria 2008-03-25 <NA>
3 3 John 2007-03-14 <NA>
4 4 Mark 2010-11-01 <NA>
5 5 Julia 2008-03-25 2008-03-14
6 1 Fred 2010-12-14 2010-11-01
7 5 Julia 2008-03-14 <NA>