Merge data.table返回空变量-R

Merge data.table返回空变量-R,r,merge,data.table,R,Merge,Data.table,我有两个数据表,我正试图合并它们。但是,这些数据表中的这些行需要大量变量以避免重复。由于数据的机密性,我们没有标识符变量,我需要几个变量的结合来匹配这两个数据集 我试图加入他们,但一旦我看到最终的数据集,变量是空的。所有值都设置为NULL数据1有17440个观察值和57个变量旧_数据有17347个观察值和12个变量。我需要11个变量来获得唯一的观察结果,让我们将它们命名为key\u variables。以下是我所拥有的: key_variables <- c("sex", "birthda

我有两个
数据表
,我正试图合并它们。但是,这些
数据表中的这些行需要大量变量以避免重复。由于数据的机密性,我们没有标识符变量,我需要几个变量的结合来匹配这两个数据集

我试图加入他们,但一旦我看到最终的数据集,变量是空的。所有值都设置为NULL<代码>数据1
有17440个观察值和57个变量<代码>旧_数据有17347个观察值和12个变量。我需要11个变量来获得唯一的观察结果,让我们将它们命名为
key\u variables
。以下是我所拥有的:

key_variables <- c("sex", "birthdate", "sint", "cons", "diag", "concelho", "Serologia", "alcohol", "end", "micro")

setkeyv(data1, key_variables)
setkeyv(old_data, key_variables)

dataFinal <- merge(data1, old_data, key_variables, all.x = T)

正如您在评论中已经提到的,两个表的时间格式不同。 这里有一种可能将它们对齐:

library(data.table)

key_variables <-
  c(
    "sex",
    "birthdate",
    "sint",
    "cons",
    "diag",
    "concelho",
    "Serologia",
    "alcohol",
    "end",
    "micro"
  )

data1 <-
  structure(
    list(
      sex = c("Masculino", "Masculino", "Masculino"),
      birthdate = c("4/23/1952", "11/26/1964", "01/08/1965"),
      sint = c("01/01/2014",
               "09/01/2010", "01/01/2008"),
      cons = c("02/10/2014", "12/01/2010",
               "1/29/2008"),
      diag = c("02/10/2014", "12/03/2010", "02/03/2008"),
      concelho = c("vila velha de ródão", "vila velha de ródão",
                   "vila velha de ródão"),
      Serologia = c("Não", "Não", "Não"),
      alcohol = c("Sim", "Não", "Sim"),
      end = c("11/03/2014",
              "10/10/2011", "9/17/2008"),
      micro = c("03/11/2008", "12/03/2010",
                "02/03/2008"),
      DInflamatoriaArticular = c("Não", "Não", "Não")
    ),
    row.names = c(NA,-3L),
    class = c("data.table", "data.frame")
  )

old_data <-
  structure(
    list(
      sex = c("Masculino", "Masculino", "Masculino"),
      birthdate = c("23/04/1952", "26/11/1964", "08/01/1965"),
      age = c(61L, 46L, 43L),
      concelho = c("vila velha de ródão",
                   "vila velha de ródão", "vila velha de ródão"),
      EstadoVital = c("Vivo",
                      "Vivo", "Vivo"),
      sint = c("01/01/2014", "01/09/2010", "01/01/2008"),
      cons = c("10/02/2014", "01/12/2010", "29/01/2008"),
      alcohol = c("Sim",
                  "Não", "Sim"),
      drugs = c("Não", "Não", "Não"),
      micro = c("11/03/2008",
                "03/12/2010", "03/02/2008"),
      diag = c("10/02/2014", "03/12/2010",
               "03/02/2008"),
      Serologia = c("Não", "Não", "Não"),
      end = c("03/11/2014",
              "10/10/2011", "17/09/2008"),
      Motivotermotratamento = c(
        "Tratamento Completado",
        "Tratamento Completado",
        "Tratamento Completado"
      ),
      ano = c(2014L,
              2010L, 2008L),
      region = c("Centro", "Centro", "Centro")
    ),
    row.names = c(NA, -3L),
    class = c("data.table", "data.frame")
  )

setkeyv(data1, key_variables)
setkeyv(old_data, key_variables)

data1[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%m/%d/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]
old_data[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]

dataFinal <- merge(data1, old_data, key_variables)
库(data.table)

关键变量您确定两个
数据的
关键变量
的内容是否匹配。表
s是否匹配?我会检查一下,例如:
unique(data1[[key\u variables[1]]])%in%unique(old\u data[[key\u variables[1]])
。可能在阻止连接的内容上有一点差异?是的,输出是真的。我在比较数据表,一个不同点是标签。一个data.table有变量标签,另一个没有。我无法想象为什么这会成为一个问题。对于所有的
关键变量
,这是真的吗?你有可能创建一个虚拟数据集来重现这个问题吗?我添加了data.tables的结构
old_data
有一个作为变量标签的属性。现在我把结构并排看,问题可能是因子水平的差异,对吗
county
有300个级别,如果级别不匹配,就没有合并?是的,我想你必须调整因子变量的结构。谢谢!这正是我刚才所做的。顺便说一句,我没有撒谎,这是
真的
当我在%unique(旧数据[[key\u variables[2]]])中做
唯一(data1[[key\u variables[2]]]])
,数据结构不同。我想匆忙中我忘了检查日期格式以及数据集之间是否匹配了,这是一个愚蠢的错误!谢谢你的帮助!
library(data.table)

key_variables <-
  c(
    "sex",
    "birthdate",
    "sint",
    "cons",
    "diag",
    "concelho",
    "Serologia",
    "alcohol",
    "end",
    "micro"
  )

data1 <-
  structure(
    list(
      sex = c("Masculino", "Masculino", "Masculino"),
      birthdate = c("4/23/1952", "11/26/1964", "01/08/1965"),
      sint = c("01/01/2014",
               "09/01/2010", "01/01/2008"),
      cons = c("02/10/2014", "12/01/2010",
               "1/29/2008"),
      diag = c("02/10/2014", "12/03/2010", "02/03/2008"),
      concelho = c("vila velha de ródão", "vila velha de ródão",
                   "vila velha de ródão"),
      Serologia = c("Não", "Não", "Não"),
      alcohol = c("Sim", "Não", "Sim"),
      end = c("11/03/2014",
              "10/10/2011", "9/17/2008"),
      micro = c("03/11/2008", "12/03/2010",
                "02/03/2008"),
      DInflamatoriaArticular = c("Não", "Não", "Não")
    ),
    row.names = c(NA,-3L),
    class = c("data.table", "data.frame")
  )

old_data <-
  structure(
    list(
      sex = c("Masculino", "Masculino", "Masculino"),
      birthdate = c("23/04/1952", "26/11/1964", "08/01/1965"),
      age = c(61L, 46L, 43L),
      concelho = c("vila velha de ródão",
                   "vila velha de ródão", "vila velha de ródão"),
      EstadoVital = c("Vivo",
                      "Vivo", "Vivo"),
      sint = c("01/01/2014", "01/09/2010", "01/01/2008"),
      cons = c("10/02/2014", "01/12/2010", "29/01/2008"),
      alcohol = c("Sim",
                  "Não", "Sim"),
      drugs = c("Não", "Não", "Não"),
      micro = c("11/03/2008",
                "03/12/2010", "03/02/2008"),
      diag = c("10/02/2014", "03/12/2010",
               "03/02/2008"),
      Serologia = c("Não", "Não", "Não"),
      end = c("03/11/2014",
              "10/10/2011", "17/09/2008"),
      Motivotermotratamento = c(
        "Tratamento Completado",
        "Tratamento Completado",
        "Tratamento Completado"
      ),
      ano = c(2014L,
              2010L, 2008L),
      region = c("Centro", "Centro", "Centro")
    ),
    row.names = c(NA, -3L),
    class = c("data.table", "data.frame")
  )

setkeyv(data1, key_variables)
setkeyv(old_data, key_variables)

data1[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%m/%d/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]
old_data[, c("birthdate", "sint", "cons", "diag", "end", "micro") := lapply(.SD, as.Date, format = "%d/%m/%Y"), .SDcols = c("birthdate", "sint", "cons", "diag", "end", "micro")]

dataFinal <- merge(data1, old_data, key_variables)