Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/78.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 当其中一列是列表时进行合并,生成一个新列作为列表_R_Merge_Data.table - Fatal编程技术网

R 当其中一列是列表时进行合并,生成一个新列作为列表

R 当其中一列是列表时进行合并,生成一个新列作为列表,r,merge,data.table,R,Merge,Data.table,我有两个要合并的数据集。我想用作合并键的其中一列的值在列表中。如果这些值中的任何一个出现在第二个数据集的列中,我希望将另一列中的值合并到第一个数据集中–这可能意味着有多个值,这些值应以列表形式显示 这很难解释,但希望这个示例数据能让它更清楚 示例数据 library(data.table) mother_dt <- data.table(mother = c("Penny", "Penny", "Anya", "Sam", "Sam", "Sam"),

我有两个要合并的数据集。我想用作合并键的其中一列的值在列表中。如果这些值中的任何一个出现在第二个数据集的列中,我希望将另一列中的值合并到第一个数据集中–这可能意味着有多个值,这些值应以列表形式显示

这很难解释,但希望这个示例数据能让它更清楚

示例数据

library(data.table)
mother_dt <- data.table(mother = c("Penny", "Penny", "Anya", "Sam", "Sam", "Sam"), 
                 child = c("Violet", "Prudence", "Erika", "Jake", "Wolf", "Red"))
mother_dt [, children := .(list(unique(child))), by = mother]
mother_dt [, child := NULL]
mother_dt <- unique(mother_dt , by = "mother")

child_dt <- data.table(child = c("Violet", "Prudence", "Erika", "Jake", "Wolf", "Red"), 
                             age = c(10, 8, 9, 6, 5, 2))
但它只包含最后一行中所有年龄的列表

我理解这可能是非常不寻常的行为,但有没有办法做到这一点

编辑:最终的数据表如下所示:

final_dt <- data.table(mother = c("Penny", "Anya", "Sam"), 
                      children = c(list(c("Violet", "Prudence")), list(c("Erika")), list(c("Jake", "Wolf", "Red"))),
                      age = c(list(c(10, 8)), list(c(9)), list(c(6, 5, 2))))

final\u dt我能想到的最简单的方法是,首先取消列出子项,然后合并,然后再次列出:

mother1 <- mother_dt[,.(children=unlist(children)),by=mother]
mother1[child_dt,on=c(children='child')][,.(children=list(children),age=list(age)),by=mother]

mother1你可以这样做-

  library(splitstackshape)
  newm <- mother_dt[,.(children=unlist(children)),by=mother]
  final_dt <- merge(newm,child_dt,by.x = "children",by.y = "child")

> aggregate(. ~ mother, data = cv, toString)
      mother         children     age
    1   Anya            Erika       9
    2  Penny Prudence, Violet   8, 10
    3    Sam  Jake, Red, Wolf 6, 2, 5
库(splitstackshape)

newm您可以用下面的方法来做,这样做的好处是当存在重复项时,可以在
mother
列中保留重复项

mother_dt$age <- lapply(
  mother_dt$children, 
  function(x,y) y[x], 
   y = setNames(child_dt$age, child_dt$child))

mother_dt
#    mother        children   age
# 1:  Penny Violet,Prudence 10, 8
# 2:   Anya           Erika     9
# 3:    Sam   Jake,Wolf,Red 6,5,2

您能为示例显示所需的输出吗?谢谢-我正在考虑这样的解决方案,但这比我试图实现的更优雅。
mother_dt$age <- lapply(
  mother_dt$children, 
  function(x,y) y[x], 
   y = setNames(child_dt$age, child_dt$child))

mother_dt
#    mother        children   age
# 1:  Penny Violet,Prudence 10, 8
# 2:   Anya           Erika     9
# 3:    Sam   Jake,Wolf,Red 6,5,2
library(tidyverse)
mutate(mother_dt, age = map(children,~.y[.], deframe(child_dt)))
#   mother         children     age
# 1  Penny Violet, Prudence   10, 8
# 2   Anya            Erika       9
# 3    Sam  Jake, Wolf, Red 6, 5, 2