根据R中的条件获取以前的日期

根据R中的条件获取以前的日期,r,R,我有如下数据 df <- data.frame(CustID = c(1,2,3,4,5,1,5), CustName = c("Fred","Maria","John","Mark", "Julia","Fred","Julia"), ServiceDate = c('2010-11-1','2008-3-25','2007-3-14','2010-11-1','2008-3-25','2010-12-14','2008-3-14'), stringsAsFactors = F) df

我有如下数据

df <- data.frame(CustID = c(1,2,3,4,5,1,5),
CustName = c("Fred","Maria","John","Mark", "Julia","Fred","Julia"),
ServiceDate = c('2010-11-1','2008-3-25','2007-3-14','2010-11-1','2008-3-25','2010-12-14','2008-3-14'), stringsAsFactors = F)

df$ServiceDate <- as.Date(df$ServiceDate, "%Y-%m-%d")

df

  CustID CustName ServiceDate
1      1     Fred  2010-11-01
2      2    Maria  2008-03-25
3      3     John  2007-03-14
4      4     Mark  2010-11-01
5      5    Julia  2008-03-25
6      1     Fred  2010-12-14
7      5    Julia  2008-03-14

df使用
dplyr
我认为这应该可以解决您的问题

library(dplyr)
df %>%
    group_by(CustID) %>%
    arrange(ServiceDate) %>%
    mutate(PriorServiceDate = lag(ServiceDate))

Source: local data frame [7 x 4]
Groups: CustID

  CustID CustName ServiceDate PriorServiceDate
1      1     Fred  2010-11-01             <NA>
2      1     Fred  2010-12-14       2010-11-01
3      2    Maria  2008-03-25             <NA>
4      3     John  2007-03-14             <NA>
5      4     Mark  2010-11-01             <NA>
6      5    Julia  2008-03-14             <NA>
7      5    Julia  2008-03-25       2008-03-14

dplyr
答案不同,它使用基数R并获取最小日期而不是延迟

首先获取每位客户的首次服务日期

first.service <- with(df, aggregate(ServiceDate,
                                    by=list(CustID=CustID, CustName=CustName),
                                    FUN=min))
然后重命名该列:

colnames(both)[4] <- "PriorServiceDate"

colnames(两者)[4]使用sqldf,可以通过左自联接完成。对于
b
中的特定行,保留
a
CustID
b
中的
CustID
相同且
ServiceDate
较少的行。然后在那些
a
行中选择
ServiceDate
最大的一行。这对输入的顺序没有任何假设。它保留行的原始顺序,但如果这不重要,则可以省略
order by
行:

library(sqldf)

DF <- sqldf("select b.CustID, 
                    b.CustName, 
                    b.ServiceDate ServiceDate__Date, 
                    max(a.ServiceDate) PriorDate__Date
             from df b 
             left join df a 
               on b.ServiceDate > a.ServiceDate and b.CustID = a.CUSTID 
             group by b.CustID, b.ServiceDate
             order by b.rowid", 
        method = "name__class")
库(sqldf)
DF-DF
CustID CustName ServiceDate PriorDate
1弗雷德2010-11-01
2玛丽亚2008-03-25
3约翰2007-03-14
4马克2010-11-01
5.5 Julia 2008-03-25 2008-03-14
6 1弗雷德2010-12-14 2010-11-01
7.5 Julia 2008-03-14

我认为您的
滞后
解决方案很好,因为OP特别要求提前日期,而不是最小日期。亚历克斯只是不必要地把事情复杂化了。
both <- merge(df, first.service, by=c("CustID", "CustName"))
both$x[with(both, ServiceDate == x)] <- NA
colnames(both)[4] <- "PriorServiceDate"
library(sqldf)

DF <- sqldf("select b.CustID, 
                    b.CustName, 
                    b.ServiceDate ServiceDate__Date, 
                    max(a.ServiceDate) PriorDate__Date
             from df b 
             left join df a 
               on b.ServiceDate > a.ServiceDate and b.CustID = a.CUSTID 
             group by b.CustID, b.ServiceDate
             order by b.rowid", 
        method = "name__class")
> DF
  CustID CustName ServiceDate   PriorDate
1      1     Fred  2010-11-01        <NA>
2      2    Maria  2008-03-25        <NA>
3      3     John  2007-03-14        <NA>
4      4     Mark  2010-11-01        <NA>
5      5    Julia  2008-03-25  2008-03-14
6      1     Fred  2010-12-14  2010-11-01
7      5    Julia  2008-03-14        <NA>