按ID和顺序将R中的表分组,不带间隙

按ID和顺序将R中的表分组,不带间隙,r,plyr,R,Plyr,我有一个虚构的医院数据表,需要将出院日期替换为(不存在的)患者进行医院转院时的最终出院日期 rows <- sort(c(which(data$TRANSFER_NUM != 0), which(data$TRANSFER_NUM == 1)-1)) subset <- data[rows,] 将为人员B返回错误的结果,而正确的结果应为: ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM NEW_DISCHARGE_DA

我有一个虚构的医院数据表,需要将出院日期替换为(不存在的)患者进行医院转院时的最终出院日期

rows <- sort(c(which(data$TRANSFER_NUM != 0), which(data$TRANSFER_NUM == 1)-1))
subset <- data[rows,]
将为人员B返回错误的结果,而正确的结果应为:

ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM NEW_DISCHARGE_DATE A 1992-12-04 3360 0 1993-11-25 A 1993-02-11 3361 1 1993-11-25 A 1993-03-10 3362 2 1993-11-25 A 1993-11-25 3363 3 1993-11-25 B 1987-05-15 3419 0 1987-05-19 B 1987-05-19 3420 1 1987-05-19 B 1990-02-03 3473 0 1990-02-05 B 1990-02-05 3474 1 1990-02-05 ID出院日期文件顺序转移新编号出院日期 A 1992-12-04 3360 0 1993-11-25 A 1993-02-11 3361 1 1993-11-25 A 1993-03-1033622 1993-11-25 A 1993-11-2533633 1993-11-25 B 1987-05-153419 0 1987-05-19 B 1987-05-193420 1 1987-05-19 B 1990-02-03 3473 0 1990-02-05 B 1990-02-05 3474 1 1990-02-05 我想一些额外的分组可能会有所帮助,比如:

ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM GROUP NEW_DISCHARGE_DATE A 1992-12-04 3360 0 1 1993-11-25 A 1993-02-11 3361 1 1 1993-11-25 A 1993-03-10 3362 2 1 1993-11-25 A 1993-11-25 3363 3 1 1993-11-25 B 1987-05-15 3419 0 1 1987-05-19 B 1987-05-19 3420 1 1 1987-05-19 B 1990-02-03 3473 0 2 1990-02-05 B 1990-02-05 3474 1 2 1990-02-05 ID出院日期文件顺序转移编号组新出院日期 A 1992-12-04 3360 01 1993-11-25 A 1993-02-11 3361 1 1993-11-25 A 1993-03-10336211993-11-25 A 1993-11-253363311993-11-25 B 1987-05-15341987-05-19 B 1987-05-19 3420 1 1987-05-19 B 1990-02-03 3473 02 1990-02-05 B 1990-02-05 3474 1 2 1990-02-05 任何帮助都将不胜感激

试试看:

ddply(subset, .(ID,grp=c(0,cumsum(diff(subset$TRANSFER_NUM)-1))), mutate, max=max(DISCHARGE_DATE))
它假定传输数量是连续的,即1:x

根据评论,这是我得到的结果:

subset<-read.table(text="ID     DISCHARGE_DATE   FILE_SEQUENCE   TRANSFER_NUM
A      1992-12-04       3360            0
A      1993-02-11       3361            1
A      1993-03-10       3362            2
A      1993-11-25       3363            3
B      1987-05-15       3419            0
B      1987-05-19       3420            1
B      1990-02-03       3473            0
B      1990-02-05       3474            1",header=T)

subset$DISCHARGE_DATE<-as.Date(subset$DISCHARGE_DATE)

ddply(subset, .(ID,grp=c(0,cumsum(diff(subset$TRANSFER_NUM)-1))), mutate, max=max(DISCHARGE_DATE))

  grp ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM        max
1   0  A     1992-12-04          3360            0 1993-11-25
2   0  A     1993-02-11          3361            1 1993-11-25
3   0  A     1993-03-10          3362            2 1993-11-25
4   0  A     1993-11-25          3363            3 1993-11-25
5  -6  B     1990-02-03          3473            0 1990-02-05
6  -6  B     1990-02-05          3474            1 1990-02-05
7  -4  B     1987-05-15          3419            0 1987-05-19
8  -4  B     1987-05-19          3420            1 1987-05-19
尝试:

它假定传输数量是连续的,即1:x

根据评论,这是我得到的结果:

subset<-read.table(text="ID     DISCHARGE_DATE   FILE_SEQUENCE   TRANSFER_NUM
A      1992-12-04       3360            0
A      1993-02-11       3361            1
A      1993-03-10       3362            2
A      1993-11-25       3363            3
B      1987-05-15       3419            0
B      1987-05-19       3420            1
B      1990-02-03       3473            0
B      1990-02-05       3474            1",header=T)

subset$DISCHARGE_DATE<-as.Date(subset$DISCHARGE_DATE)

ddply(subset, .(ID,grp=c(0,cumsum(diff(subset$TRANSFER_NUM)-1))), mutate, max=max(DISCHARGE_DATE))

  grp ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM        max
1   0  A     1992-12-04          3360            0 1993-11-25
2   0  A     1993-02-11          3361            1 1993-11-25
3   0  A     1993-03-10          3362            2 1993-11-25
4   0  A     1993-11-25          3363            3 1993-11-25
5  -6  B     1990-02-03          3473            0 1990-02-05
6  -6  B     1990-02-05          3474            1 1990-02-05
7  -4  B     1987-05-15          3419            0 1987-05-19
8  -4  B     1987-05-19          3420            1 1987-05-19
尝试:

它假定传输数量是连续的,即1:x

根据评论,这是我得到的结果:

subset<-read.table(text="ID     DISCHARGE_DATE   FILE_SEQUENCE   TRANSFER_NUM
A      1992-12-04       3360            0
A      1993-02-11       3361            1
A      1993-03-10       3362            2
A      1993-11-25       3363            3
B      1987-05-15       3419            0
B      1987-05-19       3420            1
B      1990-02-03       3473            0
B      1990-02-05       3474            1",header=T)

subset$DISCHARGE_DATE<-as.Date(subset$DISCHARGE_DATE)

ddply(subset, .(ID,grp=c(0,cumsum(diff(subset$TRANSFER_NUM)-1))), mutate, max=max(DISCHARGE_DATE))

  grp ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM        max
1   0  A     1992-12-04          3360            0 1993-11-25
2   0  A     1993-02-11          3361            1 1993-11-25
3   0  A     1993-03-10          3362            2 1993-11-25
4   0  A     1993-11-25          3363            3 1993-11-25
5  -6  B     1990-02-03          3473            0 1990-02-05
6  -6  B     1990-02-05          3474            1 1990-02-05
7  -4  B     1987-05-15          3419            0 1987-05-19
8  -4  B     1987-05-19          3420            1 1987-05-19
尝试:

它假定传输数量是连续的,即1:x

根据评论,这是我得到的结果:

subset<-read.table(text="ID     DISCHARGE_DATE   FILE_SEQUENCE   TRANSFER_NUM
A      1992-12-04       3360            0
A      1993-02-11       3361            1
A      1993-03-10       3362            2
A      1993-11-25       3363            3
B      1987-05-15       3419            0
B      1987-05-19       3420            1
B      1990-02-03       3473            0
B      1990-02-05       3474            1",header=T)

subset$DISCHARGE_DATE<-as.Date(subset$DISCHARGE_DATE)

ddply(subset, .(ID,grp=c(0,cumsum(diff(subset$TRANSFER_NUM)-1))), mutate, max=max(DISCHARGE_DATE))

  grp ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM        max
1   0  A     1992-12-04          3360            0 1993-11-25
2   0  A     1993-02-11          3361            1 1993-11-25
3   0  A     1993-03-10          3362            2 1993-11-25
4   0  A     1993-11-25          3363            3 1993-11-25
5  -6  B     1990-02-03          3473            0 1990-02-05
6  -6  B     1990-02-05          3474            1 1990-02-05
7  -4  B     1987-05-15          3419            0 1987-05-19
8  -4  B     1987-05-19          3420            1 1987-05-19

没错,您需要一个中间分组列。这里有一个嵌套的
ddply

ddply(
  ddply(df, "ID", mutate, GROUP=cumsum(c(0, diff(TRANSFER_NUM) < 0))),
  c("ID", "GROUP"),
  mutate, DISCHARGE_NEW=max(as.character(DISCHARGE_DATE))
)
#   ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM GROUP DISCHARGE_NEW
# 1  A     1992-12-04          3360            0     0    1993-11-25
# 2  A     1993-02-11          3361            1     0    1993-11-25
# 3  A     1993-03-10          3362            2     0    1993-11-25
# 4  A     1993-11-25          3363            3     0    1993-11-25
# 5  B     1987-05-15          3419            0     0    1987-05-19
# 6  B     1987-05-19          3420            1     0    1987-05-19
# 7  B     1990-02-03          3473            0     1    1990-02-05
# 8  B     1990-02-05          3474            1     1    1990-02-05
ddply(
ddply(df,“ID”,变异,组=cumsum(c(0,diff(TRANSFER_NUM)<0)),
c(“ID”、“集团”),
变异,放电新=最大值(如字符(放电日期))
)
#ID出院\u日期文件\u顺序转移\u数量组出院\u新建
#1A1992-12-04336001993-11-25
#2 A 1993-02-11 3361 10 1993-11-25
#3 A 1993-03-103362 2 0 1993-11-25
#4 A 1993-11-253363 3 0 1993-11-25
#5B 1987-05-15341901987-05-19
#6B 1987-05-193420 101987-05-19
#7 B 1990-02-03 3473 01 1990-02-05
#8B 1990-02-05 3474 1 1990-02-05

没错,您需要一个中间分组列。这里有一个嵌套的
ddply

ddply(
  ddply(df, "ID", mutate, GROUP=cumsum(c(0, diff(TRANSFER_NUM) < 0))),
  c("ID", "GROUP"),
  mutate, DISCHARGE_NEW=max(as.character(DISCHARGE_DATE))
)
#   ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM GROUP DISCHARGE_NEW
# 1  A     1992-12-04          3360            0     0    1993-11-25
# 2  A     1993-02-11          3361            1     0    1993-11-25
# 3  A     1993-03-10          3362            2     0    1993-11-25
# 4  A     1993-11-25          3363            3     0    1993-11-25
# 5  B     1987-05-15          3419            0     0    1987-05-19
# 6  B     1987-05-19          3420            1     0    1987-05-19
# 7  B     1990-02-03          3473            0     1    1990-02-05
# 8  B     1990-02-05          3474            1     1    1990-02-05
ddply(
ddply(df,“ID”,变异,组=cumsum(c(0,diff(TRANSFER_NUM)<0)),
c(“ID”、“集团”),
变异,放电新=最大值(如字符(放电日期))
)
#ID出院\u日期文件\u顺序转移\u数量组出院\u新建
#1A1992-12-04336001993-11-25
#2 A 1993-02-11 3361 10 1993-11-25
#3 A 1993-03-103362 2 0 1993-11-25
#4 A 1993-11-253363 3 0 1993-11-25
#5B 1987-05-15341901987-05-19
#6B 1987-05-193420 101987-05-19
#7 B 1990-02-03 3473 01 1990-02-05
#8B 1990-02-05 3474 1 1990-02-05

没错,您需要一个中间分组列。这里有一个嵌套的
ddply

ddply(
  ddply(df, "ID", mutate, GROUP=cumsum(c(0, diff(TRANSFER_NUM) < 0))),
  c("ID", "GROUP"),
  mutate, DISCHARGE_NEW=max(as.character(DISCHARGE_DATE))
)
#   ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM GROUP DISCHARGE_NEW
# 1  A     1992-12-04          3360            0     0    1993-11-25
# 2  A     1993-02-11          3361            1     0    1993-11-25
# 3  A     1993-03-10          3362            2     0    1993-11-25
# 4  A     1993-11-25          3363            3     0    1993-11-25
# 5  B     1987-05-15          3419            0     0    1987-05-19
# 6  B     1987-05-19          3420            1     0    1987-05-19
# 7  B     1990-02-03          3473            0     1    1990-02-05
# 8  B     1990-02-05          3474            1     1    1990-02-05
ddply(
ddply(df,“ID”,变异,组=cumsum(c(0,diff(TRANSFER_NUM)<0)),
c(“ID”、“集团”),
变异,放电新=最大值(如字符(放电日期))
)
#ID出院\u日期文件\u顺序转移\u数量组出院\u新建
#1A1992-12-04336001993-11-25
#2 A 1993-02-11 3361 10 1993-11-25
#3 A 1993-03-103362 2 0 1993-11-25
#4 A 1993-11-253363 3 0 1993-11-25
#5B 1987-05-15341901987-05-19
#6B 1987-05-193420 101987-05-19
#7 B 1990-02-03 3473 01 1990-02-05
#8B 1990-02-05 3474 1 1990-02-05

没错,您需要一个中间分组列。这里有一个嵌套的
ddply

ddply(
  ddply(df, "ID", mutate, GROUP=cumsum(c(0, diff(TRANSFER_NUM) < 0))),
  c("ID", "GROUP"),
  mutate, DISCHARGE_NEW=max(as.character(DISCHARGE_DATE))
)
#   ID DISCHARGE_DATE FILE_SEQUENCE TRANSFER_NUM GROUP DISCHARGE_NEW
# 1  A     1992-12-04          3360            0     0    1993-11-25
# 2  A     1993-02-11          3361            1     0    1993-11-25
# 3  A     1993-03-10          3362            2     0    1993-11-25
# 4  A     1993-11-25          3363            3     0    1993-11-25
# 5  B     1987-05-15          3419            0     0    1987-05-19
# 6  B     1987-05-19          3420            1     0    1987-05-19
# 7  B     1990-02-03          3473            0     1    1990-02-05
# 8  B     1990-02-05          3474            1     1    1990-02-05
ddply(
ddply(df,“ID”,变异,组=cumsum(c(0,diff(TRANSFER_NUM)<0)),
c(“ID”、“集团”),
变异,放电新=最大值(如字符(放电日期))
)
#ID出院\u日期文件\u顺序转移\u数量组出院\u新建
#1A1992-12-04336001993-11-25
#2 A 1993-02-11 3361 10 1993-11-25
#3 A 1993-03-103362 2 0 1993-11-25
#4 A 1993-11-253363 3 0 1993-11-25
#5B 1987-05-15341901987-05-19
#6B 1987-05-193420 101987-05-19
#7 B 1990-02-03 3473 01 1990-02-05
#8B 1990-02-05 3474 1 1990-02-05

只是一个温和的提示:
数据
子集
都是使用良好的
R
命令。您可能会考虑不使用它们作为对象名称。