R 按ID和日期合并行_R_Date_Merge

R 按ID和日期合并行

r date merge

R 按ID和日期合并行,r,date,merge,R,Date,Merge,我是R的新手，我一直在寻找如何解决以下问题我有一个df，看起来像： id----------日期---------OB1------OB2------OB3 1------2017-01-01------1------0------0 2------2006-01-05------1------0------0 2----2007-04-19----0----1----0 3---2015-02-23----0----0----1 3---2015-02-23----1----0----0 我

我是R的新手，我一直在寻找如何解决以下问题

我有一个df，看起来像：

id----------日期---------OB1------OB2------OB3
1------2017-01-01------1------0------0
2------2006-01-05------1------0------0
2----2007-04-19----0----1----0
3---2015-02-23----0----0----1
3---2015-02-23----1----0----0

我必须实现的目标如下所示：

这就是，按id和日期组合行

如果日期中有OB3的值“1”，而同一日期（同一ID）中有OB1的值“1”，则结果必须是OB1的值“1”，OB3的值“1”和单个日期

我一直在尝试应用这里介绍的一些解决方案：

但它不起作用

编辑：OB1、OB2、OB3是布尔值谢谢你的帮助

编辑2：聚合（.~ID+Date，df，any）有效

样本数据 输入数据

structure(list(ID = c(-1L, 1L, 1L), Date = c("2008-01-15", "2011-01-21", "2011-01-21"), `OBS1` = c(0, 0, 0), `OBS2` = c(0, 0, 0), `OBS3` = c(0, 0, 0), `OBS4` = c(0, 0, 0), `OBS5` = c(0, 0, 0), `OBS6` = c(0, 1, 0)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, 3L), class = "data.frame")

structure(list(ID = c(-1L, 1L), Date = c("2008-01-15", "2011-01-21"), `OBS1` = c(FALSE, FALSE), `OBS2` = c(FALSE, FALSE), `OBS3` = c(FALSE, FALSE), `OBS4` = c(FALSE, FALSE), `OBS5` = c(FALSE, FALSE), `OBS6` = c(FALSE, TRUE)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, -2L), class = "data.frame")

输出数据

structure(list(ID = c(-1L, 1L, 1L), Date = c("2008-01-15", "2011-01-21", "2011-01-21"), `OBS1` = c(0, 0, 0), `OBS2` = c(0, 0, 0), `OBS3` = c(0, 0, 0), `OBS4` = c(0, 0, 0), `OBS5` = c(0, 0, 0), `OBS6` = c(0, 1, 0)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, 3L), class = "data.frame")

structure(list(ID = c(-1L, 1L), Date = c("2008-01-15", "2011-01-21"), `OBS1` = c(FALSE, FALSE), `OBS2` = c(FALSE, FALSE), `OBS3` = c(FALSE, FALSE), `OBS4` = c(FALSE, FALSE), `OBS5` = c(FALSE, FALSE), `OBS6` = c(FALSE, TRUE)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, -2L), class = "data.frame")

使用base R的

aggregate（）

函数已经回答了这个问题

然而，我觉得很难将问题中打印的样本数据集转换为可复制的示例（在OP编辑问题以包含

dput（）

的结果之前）

此外，OP提到他有一个“非常大的df”，这可能值得尝试

data.table

方法

将示例数据转换为数据帧

库（magrittr）
库（数据表）
df%stringr:：str_replace_all（“[-]{2，}，”）%>%
fread（）
df

id日期OB1 OB2 OB3
1:1 2017-01-01真假假假
2:22006-01-05对错错错
3:22007-04-19假-真-假
4:3 2015-02-23假-假-真
5:3 2015-02-23真假假假

请注意，

fread（）

已自动识别布尔列

总数的

库（data.table）
setDT（df）[，lappy（.SD，any），by=（id，Date）]

id日期OB1 OB2 OB3
1:1 2017-01-01真假假假
2:22006-01-05对错错错
3:22007-04-19假-真-假
4:3 2015-02-23真-假-真

如果OP需要整数值

和

而不是逻辑值，则可以一次性创建这些值：

setDT（df）[，lapply（.SD，函数（x）作为.integer（任意（x））），by=（id，Date）]

id日期OB1 OB2 OB3
1:  1 2017-01-01   1   0   0
2:  2 2006-01-05   1   0   0
3:  2 2007-04-19   0   1   0
4:  3 2015-02-23   1   0   1

请在您的问题中包括哪些不起作用

aggregate（.~id+Date，df，sum）

@alistaire用这句话，我得到了OB1值“2”和OB2值“0”。可能只是

aggregate（.~Date，df[，-1]，sum）

。用

任何替换sum
？Guau谢谢！我会尝试这个代码，看起来很神奇