R 根据分类变量分配日期

R 根据分类变量分配日期,r,R,我有一套付款日期,格式如下: ID Payment Date 1 18-01-01 1 18-02-03 2 18-04-03 2 18-05-08 2 18-06-06 3 17-12-23 3 18-01-22 3 18-02-24 4 17-11-09 4 18-12-06 我想添加一列,激活日期,作为每个ID的最早付款日期,如下所示: ID Payment Date Activation Date 1 18-01-01

我有一套
付款日期
,格式如下:

ID  Payment Date   
1   18-01-01
1   18-02-03
2   18-04-03
2   18-05-08
2   18-06-06
3   17-12-23
3   18-01-22
3   18-02-24
4   17-11-09
4   18-12-06
我想添加一列,
激活日期
,作为每个
ID
的最早付款日期,如下所示:

ID  Payment Date   Activation Date
1   18-01-01       18-01-01
1   18-02-03       18-01-01
2   18-04-03       18-04-03
2   18-05-08       18-04-03
2   18-06-06       18-04-03
3   17-12-23       17-12-23
3   18-01-22       17-12-23
3   18-02-24       17-12-23
4   17-11-09       17-11-09
4   18-12-06       17-11-09

我在想,与其进入一个循环,一个接一个地处理每个ID,还不如有一个更聪明的方法来做到这一点

使用
数据表的解决方案

df = read.table(text = "
ID  PaymentDate   
1   18-01-01
1   18-02-03
2   18-04-03
2   18-05-08
2   18-06-06
3   17-12-23
3   18-01-22
3   18-02-24
4   17-11-09
4   18-12-06
", header=T)

library(dplyr)
library(lubridate)

df %>%
  group_by(ID) %>%
  mutate(ActivationDate = min(ymd(PaymentDate))) %>%
  ungroup()

# # A tibble: 10 x 3
#     ID PaymentDate ActivationDate
#   <int> <fct>       <date>        
# 1     1 18-01-01    2018-01-01    
# 2     1 18-02-03    2018-01-01    
# 3     2 18-04-03    2018-04-03    
# 4     2 18-05-08    2018-04-03    
# 5     2 18-06-06    2018-04-03    
# 6     3 17-12-23    2017-12-23    
# 7     3 18-01-22    2017-12-23    
# 8     3 18-02-24    2017-12-23    
# 9     4 17-11-09    2017-11-09    
#10     4 18-12-06    2017-11-09 
数据:

结果:

 #   ID  Payment Activation
 #1:  1 18-01-01   18-01-01
 #2:  1 18-02-03   18-01-01
 #3:  2 18-04-03   18-04-03
 #4:  2 18-05-08   18-04-03
 #5:  2 18-06-06   18-04-03
 #6:  3 17-12-23   17-12-23
 #7:  3 18-01-22   17-12-23
 #8:  3 18-02-24   17-12-23
 #9:  4 17-11-09   17-11-09
#10:  4 18-12-06   17-11-09

快速提示:

  • 切勿在列名中再次使用“空格”
  • 使用下划线或大小写。例如:
    付款日期
    付款日期

使用
sqldf

您的数据集:

df=read.table(text="ID  PaymentDate   
          1   18-01-01
          1   18-02-03
          2   18-04-03
          2   18-05-08
          2   18-06-06
          3   17-12-23
          3   18-01-22
          3   18-02-24
          4   17-11-09
          4   18-12-06",header=T)
代码

输出:

   ID PaymentDate ActivationDate
1   1    18-01-01       18-01-01
2   1    18-02-03       18-01-01
3   2    18-04-03       18-04-03
4   2    18-05-08       18-04-03
5   2    18-06-06       18-04-03
6   3    17-12-23       17-12-23
7   3    18-01-22       17-12-23
8   3    18-02-24       17-12-23
9   4    17-11-09       17-11-09
10  4    18-12-06       17-11-09

如何生成激活日期值?
ave(as.Date(df$PaymentDate,%y-%m-%d),df$ID,FUN=min)
@RonakShah应该是最重要的答案!谢谢Andre,这是我的第一篇帖子,所以我有点困惑如何发布东西。我在R中理解这一点:)
 #   ID  Payment Activation
 #1:  1 18-01-01   18-01-01
 #2:  1 18-02-03   18-01-01
 #3:  2 18-04-03   18-04-03
 #4:  2 18-05-08   18-04-03
 #5:  2 18-06-06   18-04-03
 #6:  3 17-12-23   17-12-23
 #7:  3 18-01-22   17-12-23
 #8:  3 18-02-24   17-12-23
 #9:  4 17-11-09   17-11-09
#10:  4 18-12-06   17-11-09
df=read.table(text="ID  PaymentDate   
          1   18-01-01
          1   18-02-03
          2   18-04-03
          2   18-05-08
          2   18-06-06
          3   17-12-23
          3   18-01-22
          3   18-02-24
          4   17-11-09
          4   18-12-06",header=T)
# we can first find the minimum PaymentDate using the inner query and then
# populate the data.frame using the inner query
sqldf("select a.ID,a.PaymentDate, b.ActivationDate from df as a JOIN 
 (select ID,min(PaymentDate) as ActivationDate from df group by ID) as b where a.ID=b.ID")
   ID PaymentDate ActivationDate
1   1    18-01-01       18-01-01
2   1    18-02-03       18-01-01
3   2    18-04-03       18-04-03
4   2    18-05-08       18-04-03
5   2    18-06-06       18-04-03
6   3    17-12-23       17-12-23
7   3    18-01-22       17-12-23
8   3    18-02-24       17-12-23
9   4    17-11-09       17-11-09
10  4    18-12-06       17-11-09