Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/77.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:如果记录在特定列上匹配,但在另一列中不同,则删除差异值为NA的行_R_Date_Na - Fatal编程技术网

R:如果记录在特定列上匹配,但在另一列中不同,则删除差异值为NA的行

R:如果记录在特定列上匹配,但在另一列中不同,则删除差异值为NA的行,r,date,na,R,Date,Na,我有一个数据集,希望删除具有相同患者ID、药物、剂量和Start.date的行,但其中一个有结束日期,另一个没有。我想删除带有NA End.date的行 ID First.name Last.name Report.year Medication Dosage Start.date End.date 1 John Doe 2013 Modulator A Dosage 1 2013-01-01 <NA> 1

我有一个数据集,希望删除具有相同患者ID、药物、剂量和Start.date的行,但其中一个有结束日期,另一个没有。我想删除带有NA End.date的行

ID      First.name Last.name Report.year  Medication   Dosage Start.date   End.date
1       John       Doe        2013 Modulator A Dosage 1 2013-01-01       <NA>
1       John       Doe        2013 Modulator A Dosage 2 2013-01-01       <NA>
1       John       Doe        2016 Modulator B Dosage 1 2016-01-01       <NA>****REMOVE
1       John       Doe        2018 Modulator B Dosage 1 2016-01-01 2018-12-31 
1       John       Doe        2019 Modulator C     <NA> 2019-01-01       <NA>****REMOVE
1       John       Doe        2020 Modulator C Dosage 1 2019-01-01       <NA>       
1       John       Doe        2021 Modulator C     <NA> 2019-01-01 2021-12-31

在Base-R中,这将重新排列数据顺序,以确保具有End.dates!=这些都是保留的。c1,5,6,7确定要检查哪些列的重复

df <- df[order(df$End.date),]
df[!duplicated(apply(df[,c(1,5,6,7)],1,data.frame)),]

  ID First.name Last.name Report.year Medication  Dosage Start.date   End.date
4  1       John       Doe        2018 ModulatorB Dosage1 2016-01-01 2018-12-31
7  1       John       Doe        2021 ModulatorC    <NA> 2019-01-01 2021-12-31
1  1       John       Doe        2013 ModulatorA Dosage1 2013-01-01       <NA>
2  1       John       Doe        2013 ModulatorA Dosage2 2013-01-01       <NA>
6  1       John       Doe        2020 ModulatorC Dosage1 2019-01-01       <NA>
示例数据:

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), First.name = c("John", 
"John", "John", "John", "John", "John", "John"), Last.name = c("Doe", 
"Doe", "Doe", "Doe", "Doe", "Doe", "Doe"), Report.year = c(2018L, 
2021L, 2013L, 2013L, 2016L, 2019L, 2020L), Medication = c("ModulatorB", 
"ModulatorC", "ModulatorA", "ModulatorA", "ModulatorB", "ModulatorC", 
"ModulatorC"), Dosage = c("Dosage1", NA, "Dosage1", "Dosage2", 
"Dosage1", NA, "Dosage1"), Start.date = c("2016-01-01", "2019-01-01", 
"2013-01-01", "2013-01-01", "2016-01-01", "2019-01-01", "2019-01-01"
), End.date = c("2018-12-31", "2021-12-31", NA, NA, NA, NA, NA
)), row.names = c(4L, 7L, 1L, 2L, 3L, 5L, 6L), class = "data.frame")

我不确定我是否遵循了逻辑,但您可能需要:filtern==1 |!is.naEnd.date…这将保留每个剂量只有一行的药物,如果有多行,则将保留End.date中没有NA的行。这与您要查找的内容一致吗?我尝试了数据%>%group\u按患者ID、药物、剂量、开始日期%>%filter!sumis.naEnd.date=n&is.naEnd.date并得到了我想要的结果。你能解释一下过滤功能的作用吗?分组后,一些药物/剂量只有一行,例如Mod A Dos 1第一行。在这里,你似乎想保留这一行,即使它的结束日期是NA。因为只有一行,所以过滤器中的n==1将保留这一行数据。因此,无论是否有End.date,它只处理一行的情况。然后,对于其他药物/剂量,您的药物/剂量组中有2行。在这些情况下,筛选器会将非NA的行保留为End.date。在本例中,对于给定的med/剂量,没有超过2行的情况……不确定在这些情况下,如果它们确实存在,您会想要什么。
df <- df[order(df$End.date),]
df[!duplicated(apply(df[,c(1,5,6,7)],1,data.frame)),]

  ID First.name Last.name Report.year Medication  Dosage Start.date   End.date
4  1       John       Doe        2018 ModulatorB Dosage1 2016-01-01 2018-12-31
7  1       John       Doe        2021 ModulatorC    <NA> 2019-01-01 2021-12-31
1  1       John       Doe        2013 ModulatorA Dosage1 2013-01-01       <NA>
2  1       John       Doe        2013 ModulatorA Dosage2 2013-01-01       <NA>
6  1       John       Doe        2020 ModulatorC Dosage1 2019-01-01       <NA>
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), First.name = c("John", 
"John", "John", "John", "John", "John", "John"), Last.name = c("Doe", 
"Doe", "Doe", "Doe", "Doe", "Doe", "Doe"), Report.year = c(2018L, 
2021L, 2013L, 2013L, 2016L, 2019L, 2020L), Medication = c("ModulatorB", 
"ModulatorC", "ModulatorA", "ModulatorA", "ModulatorB", "ModulatorC", 
"ModulatorC"), Dosage = c("Dosage1", NA, "Dosage1", "Dosage2", 
"Dosage1", NA, "Dosage1"), Start.date = c("2016-01-01", "2019-01-01", 
"2013-01-01", "2013-01-01", "2016-01-01", "2019-01-01", "2019-01-01"
), End.date = c("2018-12-31", "2021-12-31", NA, NA, NA, NA, NA
)), row.names = c(4L, 7L, 1L, 2L, 3L, 5L, 6L), class = "data.frame")