R 根据分类变量分配日期
我有一套R 根据分类变量分配日期,r,R,我有一套付款日期,格式如下: ID Payment Date 1 18-01-01 1 18-02-03 2 18-04-03 2 18-05-08 2 18-06-06 3 17-12-23 3 18-01-22 3 18-02-24 4 17-11-09 4 18-12-06 我想添加一列,激活日期,作为每个ID的最早付款日期,如下所示: ID Payment Date Activation Date 1 18-01-01
付款日期
,格式如下:
ID Payment Date
1 18-01-01
1 18-02-03
2 18-04-03
2 18-05-08
2 18-06-06
3 17-12-23
3 18-01-22
3 18-02-24
4 17-11-09
4 18-12-06
我想添加一列,激活日期
,作为每个ID
的最早付款日期,如下所示:
ID Payment Date Activation Date
1 18-01-01 18-01-01
1 18-02-03 18-01-01
2 18-04-03 18-04-03
2 18-05-08 18-04-03
2 18-06-06 18-04-03
3 17-12-23 17-12-23
3 18-01-22 17-12-23
3 18-02-24 17-12-23
4 17-11-09 17-11-09
4 18-12-06 17-11-09
我在想,与其进入一个循环,一个接一个地处理每个ID,还不如有一个更聪明的方法来做到这一点 使用
数据表的解决方案
df = read.table(text = "
ID PaymentDate
1 18-01-01
1 18-02-03
2 18-04-03
2 18-05-08
2 18-06-06
3 17-12-23
3 18-01-22
3 18-02-24
4 17-11-09
4 18-12-06
", header=T)
library(dplyr)
library(lubridate)
df %>%
group_by(ID) %>%
mutate(ActivationDate = min(ymd(PaymentDate))) %>%
ungroup()
# # A tibble: 10 x 3
# ID PaymentDate ActivationDate
# <int> <fct> <date>
# 1 1 18-01-01 2018-01-01
# 2 1 18-02-03 2018-01-01
# 3 2 18-04-03 2018-04-03
# 4 2 18-05-08 2018-04-03
# 5 2 18-06-06 2018-04-03
# 6 3 17-12-23 2017-12-23
# 7 3 18-01-22 2017-12-23
# 8 3 18-02-24 2017-12-23
# 9 4 17-11-09 2017-11-09
#10 4 18-12-06 2017-11-09
数据:
结果:
# ID Payment Activation
#1: 1 18-01-01 18-01-01
#2: 1 18-02-03 18-01-01
#3: 2 18-04-03 18-04-03
#4: 2 18-05-08 18-04-03
#5: 2 18-06-06 18-04-03
#6: 3 17-12-23 17-12-23
#7: 3 18-01-22 17-12-23
#8: 3 18-02-24 17-12-23
#9: 4 17-11-09 17-11-09
#10: 4 18-12-06 17-11-09
快速提示:
- 切勿在列名中再次使用“空格”
- 使用下划线或大小写。例如:
付款日期
,付款日期
使用sqldf
:
您的数据集:
df=read.table(text="ID PaymentDate
1 18-01-01
1 18-02-03
2 18-04-03
2 18-05-08
2 18-06-06
3 17-12-23
3 18-01-22
3 18-02-24
4 17-11-09
4 18-12-06",header=T)
代码
输出:
ID PaymentDate ActivationDate
1 1 18-01-01 18-01-01
2 1 18-02-03 18-01-01
3 2 18-04-03 18-04-03
4 2 18-05-08 18-04-03
5 2 18-06-06 18-04-03
6 3 17-12-23 17-12-23
7 3 18-01-22 17-12-23
8 3 18-02-24 17-12-23
9 4 17-11-09 17-11-09
10 4 18-12-06 17-11-09
如何生成激活日期值?ave(as.Date(df$PaymentDate,%y-%m-%d),df$ID,FUN=min)
@RonakShah应该是最重要的答案!谢谢Andre,这是我的第一篇帖子,所以我有点困惑如何发布东西。我在R中理解这一点:)
# ID Payment Activation
#1: 1 18-01-01 18-01-01
#2: 1 18-02-03 18-01-01
#3: 2 18-04-03 18-04-03
#4: 2 18-05-08 18-04-03
#5: 2 18-06-06 18-04-03
#6: 3 17-12-23 17-12-23
#7: 3 18-01-22 17-12-23
#8: 3 18-02-24 17-12-23
#9: 4 17-11-09 17-11-09
#10: 4 18-12-06 17-11-09
df=read.table(text="ID PaymentDate
1 18-01-01
1 18-02-03
2 18-04-03
2 18-05-08
2 18-06-06
3 17-12-23
3 18-01-22
3 18-02-24
4 17-11-09
4 18-12-06",header=T)
# we can first find the minimum PaymentDate using the inner query and then
# populate the data.frame using the inner query
sqldf("select a.ID,a.PaymentDate, b.ActivationDate from df as a JOIN
(select ID,min(PaymentDate) as ActivationDate from df group by ID) as b where a.ID=b.ID")
ID PaymentDate ActivationDate
1 1 18-01-01 18-01-01
2 1 18-02-03 18-01-01
3 2 18-04-03 18-04-03
4 2 18-05-08 18-04-03
5 2 18-06-06 18-04-03
6 3 17-12-23 17-12-23
7 3 18-01-22 17-12-23
8 3 18-02-24 17-12-23
9 4 17-11-09 17-11-09
10 4 18-12-06 17-11-09