用于r中事件发生数据的函数
我有一个患者数据集,我需要在第一次出现疾病列后删除这些行。比如说用于r中事件发生数据的函数,r,date,R,Date,我有一个患者数据集,我需要在第一次出现疾病列后删除这些行。比如说 ID Date Disease 123 02-03-2012 0 123 03-03-2013 1 123 04-03-2014 0 321 03-03-2015 1 423 06-06-2016 1 423 07-06-2017 1 543 08-05-2018 1 543 09-06-2019 0 645 08-09-2019 0 以及我想要的预期输出 ID Date Diseas
ID Date Disease
123 02-03-2012 0
123 03-03-2013 1
123 04-03-2014 0
321 03-03-2015 1
423 06-06-2016 1
423 07-06-2017 1
543 08-05-2018 1
543 09-06-2019 0
645 08-09-2019 0
以及我想要的预期输出
ID Date Disease
123 02-03-2012 0
123 03-03-2013 1
321 03-03-2015 1
423 06-06-2016 1
543 08-05-2018 1
使用
dplyr
单向选择行,直到每个ID
第一次出现1为止
library(dplyr)
df %>% group_by(ID) %>% filter(row_number() <= which(Disease == 1)[1])
# ID Date Disease
# <int> <fct> <int>
#1 123 02-03-2012 0
#2 123 03-03-2013 1
#3 321 03-03-2015 1
#4 423 06-06-2016 1
#5 543 08-05-2018 1
数据
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L,
543L, 645L), Date = structure(c(1L, 2L, 4L, 3L, 5L, 6L, 7L, 9L,
8L), .Label = c("02-03-2012", "03-03-2013", "03-03-2015", "04-03-2014",
"06-06-2016", "07-06-2017", "08-05-2018", "08-09-2019", "09-06-2019"
), class = "factor"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L,
0L, 0L)), class = "data.frame", row.names = c(NA, -9L))
df我不知道为什么您的预期结果中没有最后一行645 08-09-2019 0
。ID 645的“疾病的首次出现”列尚未出现,因此我猜您可能在预期结果中错过了它
根据我上面的猜测,也许您可以使用subset
+ave
dfout <- subset(df,!!ave(Disease,ID,FUN = function(v) !duplicated(cumsum(v)>0)))
数据
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L,
543L, 645L), Date = c("02-03-2012", "03-03-2013", "04-03-2014",
"03-03-2015", "06-06-2016", "07-06-2017", "08-05-2018", "09-06-2019",
"08-09-2019"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -9L))
df感谢dude它在疾病列中存在NAs的情况下非常有效,应该在代码中添加什么。@Matrix_32您想保留NA
s还是删除它们?如果你想保持上面的代码不变,但是如果你想删除它们,你可以在最后添加na.省略df%>%group\u by(ID)%%>%filter(row\u number()%na.omit()
我想保留疾病状态为0的患者ID以及他们的最小记录日期。我该如何做。你的意思是df%>%mutate(date=as.date(date),%m-%d-%Y”)%%group\u by(ID)%%>%slice(if(any(disease==0))哪个.min(date)其他0)
?
> dfout
ID Date Disease
1 123 02-03-2012 0
2 123 03-03-2013 1
4 321 03-03-2015 1
5 423 06-06-2016 1
7 543 08-05-2018 1
9 645 08-09-2019 0
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L,
543L, 645L), Date = c("02-03-2012", "03-03-2013", "04-03-2014",
"03-03-2015", "06-06-2016", "07-06-2017", "08-05-2018", "09-06-2019",
"08-09-2019"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -9L))