Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于r中事件发生数据的函数_R_Date - Fatal编程技术网

用于r中事件发生数据的函数

用于r中事件发生数据的函数,r,date,R,Date,我有一个患者数据集,我需要在第一次出现疾病列后删除这些行。比如说 ID Date Disease 123 02-03-2012 0 123 03-03-2013 1 123 04-03-2014 0 321 03-03-2015 1 423 06-06-2016 1 423 07-06-2017 1 543 08-05-2018 1 543 09-06-2019 0 645 08-09-2019 0 以及我想要的预期输出 ID Date Diseas

我有一个患者数据集,我需要在第一次出现疾病列后删除这些行。比如说

ID    Date    Disease
123 02-03-2012  0
123 03-03-2013  1
123 04-03-2014  0
321 03-03-2015  1
423 06-06-2016  1
423 07-06-2017  1
543 08-05-2018  1
543 09-06-2019  0
645 08-09-2019  0
以及我想要的预期输出

ID    Date     Disease
123 02-03-2012  0
123 03-03-2013  1
321 03-03-2015  1
423 06-06-2016  1
543 08-05-2018  1

使用
dplyr
单向选择行,直到每个
ID
第一次出现1为止

library(dplyr)

df %>% group_by(ID) %>% filter(row_number() <= which(Disease == 1)[1])


#    ID  Date        Disease
#  <int> <fct>        <int>
#1   123 02-03-2012       0
#2   123 03-03-2013       1
#3   321 03-03-2015       1
#4   423 06-06-2016       1
#5   543 08-05-2018       1
数据

df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L, 
543L, 645L), Date = structure(c(1L, 2L, 4L, 3L, 5L, 6L, 7L, 9L, 
8L), .Label = c("02-03-2012", "03-03-2013", "03-03-2015", "04-03-2014", 
"06-06-2016", "07-06-2017", "08-05-2018", "08-09-2019", "09-06-2019"
), class = "factor"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 
0L, 0L)), class = "data.frame", row.names = c(NA, -9L))

df我不知道为什么您的预期结果中没有最后一行
645 08-09-2019 0
。ID 645的“疾病的首次出现”列尚未出现,因此我猜您可能在预期结果中错过了它

根据我上面的猜测,也许您可以使用
subset
+
ave

dfout <- subset(df,!!ave(Disease,ID,FUN = function(v) !duplicated(cumsum(v)>0)))
数据

df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L, 
543L, 645L), Date = c("02-03-2012", "03-03-2013", "04-03-2014", 
"03-03-2015", "06-06-2016", "07-06-2017", "08-05-2018", "09-06-2019", 
"08-09-2019"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -9L))

df感谢dude它在疾病列中存在NAs的情况下非常有效,应该在代码中添加什么。@Matrix_32您想保留
NA
s还是删除它们?如果你想保持上面的代码不变,但是如果你想删除它们,你可以在最后添加
na.省略
df%>%group\u by(ID)%%>%filter(row\u number()%na.omit()
我想保留疾病状态为0的患者ID以及他们的最小记录日期。我该如何做。你的意思是
df%>%mutate(date=as.date(date),%m-%d-%Y”)%%group\u by(ID)%%>%slice(if(any(disease==0))哪个.min(date)其他0)
> dfout
   ID       Date Disease
1 123 02-03-2012       0
2 123 03-03-2013       1
4 321 03-03-2015       1
5 423 06-06-2016       1
7 543 08-05-2018       1
9 645 08-09-2019       0
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L, 
543L, 645L), Date = c("02-03-2012", "03-03-2013", "04-03-2014", 
"03-03-2015", "06-06-2016", "07-06-2017", "08-05-2018", "09-06-2019", 
"08-09-2019"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -9L))