Merge data.table返回空变量-R
我有两个Merge data.table返回空变量-R,r,merge,data.table,R,Merge,Data.table,我有两个数据表,我正试图合并它们。但是,这些数据表中的这些行需要大量变量以避免重复。由于数据的机密性,我们没有标识符变量,我需要几个变量的结合来匹配这两个数据集 我试图加入他们,但一旦我看到最终的数据集,变量是空的。所有值都设置为NULL数据1有17440个观察值和57个变量旧_数据有17347个观察值和12个变量。我需要11个变量来获得唯一的观察结果,让我们将它们命名为key\u variables。以下是我所拥有的: key_variables <- c("sex", "birthda
数据表
,我正试图合并它们。但是,这些数据表中的这些行需要大量变量以避免重复。由于数据的机密性,我们没有标识符变量,我需要几个变量的结合来匹配这两个数据集
我试图加入他们,但一旦我看到最终的数据集,变量是空的。所有值都设置为NULL<代码>数据1
有17440个观察值和57个变量<代码>旧_数据有17347个观察值和12个变量。我需要11个变量来获得唯一的观察结果,让我们将它们命名为key\u variables
。以下是我所拥有的:
key_variables <- c("sex", "birthdate", "sint", "cons", "diag", "concelho", "Serologia", "alcohol", "end", "micro")
setkeyv(data1, key_variables)
setkeyv(old_data, key_variables)
dataFinal <- merge(data1, old_data, key_variables, all.x = T)
正如您在评论中已经提到的,两个表的时间格式不同。 这里有一种可能将它们对齐:
library(data.table)
key_variables <-
c(
"sex",
"birthdate",
"sint",
"cons",
"diag",
"concelho",
"Serologia",
"alcohol",
"end",
"micro"
)
data1 <-
structure(
list(
sex = c("Masculino", "Masculino", "Masculino"),
birthdate = c("4/23/1952", "11/26/1964", "01/08/1965"),
sint = c("01/01/2014",
"09/01/2010", "01/01/2008"),
cons = c("02/10/2014", "12/01/2010",
"1/29/2008"),
diag = c("02/10/2014", "12/03/2010", "02/03/2008"),
concelho = c("vila velha de ródão", "vila velha de ródão",
"vila velha de ródão"),
Serologia = c("Não", "Não", "Não"),
alcohol = c("Sim", "Não", "Sim"),
end = c("11/03/2014",
"10/10/2011", "9/17/2008"),
micro = c("03/11/2008", "12/03/2010",
"02/03/2008"),
DInflamatoriaArticular = c("Não", "Não", "Não")
),
row.names = c(NA,-3L),
class = c("data.table", "data.frame")
)
old_data <-
structure(
list(
sex = c("Masculino", "Masculino", "Masculino"),
birthdate = c("23/04/1952", "26/11/1964", "08/01/1965"),
age = c(61L, 46L, 43L),
concelho = c("vila velha de ródão",
"vila velha de ródão", "vila velha de ródão"),
EstadoVital = c("Vivo",
"Vivo", "Vivo"),
sint = c("01/01/2014", "01/09/2010", "01/01/2008"),
cons = c("10/02/2014", "01/12/2010", "29/01/2008"),
alcohol = c("Sim",
"Não", "Sim"),
drugs = c("Não", "Não", "Não"),
micro = c("11/03/2008",
"03/12/2010", "03/02/2008"),
diag = c("10/02/2014", "03/12/2010",
"03/02/2008"),
Serologia = c("Não", "Não", "Não"),
end = c("03/11/2014",
"10/10/2011", "17/09/2008"),
Motivotermotratamento = c(
"Tratamento Completado",
"Tratamento Completado",
"Tratamento Completado"
),
ano = c(2014L,
2010L, 2008L),
region = c("Centro", "Centro", "Centro")
),
row.names = c(NA, -3L),
class = c("data.table", "data.frame")
)
setkeyv(data1, key_variables)
setkeyv(old_data, key_variables)
data1[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%m/%d/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]
old_data[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]
dataFinal <- merge(data1, old_data, key_variables)
库(data.table)
关键变量您确定两个数据的关键变量
的内容是否匹配。表
s是否匹配?我会检查一下,例如:unique(data1[[key\u variables[1]]])%in%unique(old\u data[[key\u variables[1]])
。可能在阻止连接的内容上有一点差异?是的,输出是真的。我在比较数据表,一个不同点是标签。一个data.table有变量标签,另一个没有。我无法想象为什么这会成为一个问题。对于所有的关键变量,这是真的吗?你有可能创建一个虚拟数据集来重现这个问题吗?我添加了data.tables的结构old_data
有一个作为变量标签的属性。现在我把结构并排看,问题可能是因子水平的差异,对吗county
有300个级别,如果级别不匹配,就没有合并?是的,我想你必须调整因子变量的结构。谢谢!这正是我刚才所做的。顺便说一句,我没有撒谎,这是真的
当我在%unique(旧数据[[key\u variables[2]]])中做唯一(data1[[key\u variables[2]]]])
,数据结构不同。我想匆忙中我忘了检查日期格式以及数据集之间是否匹配了,这是一个愚蠢的错误!谢谢你的帮助!
library(data.table)
key_variables <-
c(
"sex",
"birthdate",
"sint",
"cons",
"diag",
"concelho",
"Serologia",
"alcohol",
"end",
"micro"
)
data1 <-
structure(
list(
sex = c("Masculino", "Masculino", "Masculino"),
birthdate = c("4/23/1952", "11/26/1964", "01/08/1965"),
sint = c("01/01/2014",
"09/01/2010", "01/01/2008"),
cons = c("02/10/2014", "12/01/2010",
"1/29/2008"),
diag = c("02/10/2014", "12/03/2010", "02/03/2008"),
concelho = c("vila velha de ródão", "vila velha de ródão",
"vila velha de ródão"),
Serologia = c("Não", "Não", "Não"),
alcohol = c("Sim", "Não", "Sim"),
end = c("11/03/2014",
"10/10/2011", "9/17/2008"),
micro = c("03/11/2008", "12/03/2010",
"02/03/2008"),
DInflamatoriaArticular = c("Não", "Não", "Não")
),
row.names = c(NA,-3L),
class = c("data.table", "data.frame")
)
old_data <-
structure(
list(
sex = c("Masculino", "Masculino", "Masculino"),
birthdate = c("23/04/1952", "26/11/1964", "08/01/1965"),
age = c(61L, 46L, 43L),
concelho = c("vila velha de ródão",
"vila velha de ródão", "vila velha de ródão"),
EstadoVital = c("Vivo",
"Vivo", "Vivo"),
sint = c("01/01/2014", "01/09/2010", "01/01/2008"),
cons = c("10/02/2014", "01/12/2010", "29/01/2008"),
alcohol = c("Sim",
"Não", "Sim"),
drugs = c("Não", "Não", "Não"),
micro = c("11/03/2008",
"03/12/2010", "03/02/2008"),
diag = c("10/02/2014", "03/12/2010",
"03/02/2008"),
Serologia = c("Não", "Não", "Não"),
end = c("03/11/2014",
"10/10/2011", "17/09/2008"),
Motivotermotratamento = c(
"Tratamento Completado",
"Tratamento Completado",
"Tratamento Completado"
),
ano = c(2014L,
2010L, 2008L),
region = c("Centro", "Centro", "Centro")
),
row.names = c(NA, -3L),
class = c("data.table", "data.frame")
)
setkeyv(data1, key_variables)
setkeyv(old_data, key_variables)
data1[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%m/%d/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]
old_data[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]
dataFinal <- merge(data1, old_data, key_variables)