rbindfill类向量列表合并_R_Rbind

rbindfill类向量列表合并

rbindfill类向量列表合并,r,rbind,R,Rbind,我有一个命名向量的列表（参见下文和末尾的dputversion），如果向量不包含名称（本例中为字符），我想将它们“合并”在一起，形成一个矩阵并填入零。这似乎并不难，但我还没有找到解决问题的工作基础解决方案。我曾考虑过使用match，但当我确信有一种奇特的方法可以同时使用do.call和rbind时，这似乎花费了很多时间命名向量列表： $greg e i k l 1 2 1 1 $sam ! c e i t 1 1 1 2 1 $teacher ? c i k l 1 1 1

我有一个命名向量的列表（参见下文和末尾的

dput

version），如果向量不包含名称（本例中为字符），我想将它们“合并”在一起，形成一个矩阵并填入零。这似乎并不难，但我还没有找到解决问题的工作基础解决方案。我曾考虑过使用match，但当我确信有一种奇特的方法可以同时使用
do.call
和
rbind
时，这似乎花费了很多时间
命名向量列表：

$greg e i k l 1 2 1 1 $sam ! c e i t 1 1 1 2 1 $teacher ? c i k l 1 1 1 1 1
最终期望输出

! ? c e i k l t greg 0 0 0 1 2 1 1 0 sam 1 0 1 1 2 0 0 1 teacher 0 1 1 0 1 1 1 0
这很可能是人们会给出的输出，用0填充NAs很容易

! ? c e i k l t greg NA NA NA 1 2 1 1 NA sam 1 NA 1 1 2 NA NA 1 teacher NA 1 1 NA 1 1 1 NA
样本数据

L2 <- structure(list(greg = structure(c(1L, 2L, 1L, 1L), .Dim = 4L, .Dimnames = structure(list( c("e", "i", "k", "l")), .Names = ""), class = "table"), sam = structure(c(1L, 1L, 1L, 2L, 1L), .Dim = 5L, .Dimnames = structure(list(c("!", "c", "e", "i", "t")), .Names = ""), class = "table"), teacher = structure(c(1L, 1L, 1L, 1L, 1L), .Dim = 5L, .Dimnames = structure(list(c("?", "c", "i", "k", "l")), .Names = ""), class = "table")), .Names = c("greg", "sam", "teacher"))

L2在键入此内容时，我想到了此解决方案，但不知道是否有更有效的解决方案： chars <- sort(unique(unlist(lapply(L2, names)))) L3 <- lapply(L2, function(x){ nots <- chars[!chars %in% names(x)] new <- rev(c(x, rep(0, length(nots)))) names(new)[1:length(nots)] <- nots new[order(names(new))] }) do.call(rbind, L3) 重塑解决方案。通过将列表合并为长格式，然后使用dcast 将其重新整形为宽格式，可以轻松地使用Reforme2包实现这一点： > library(reshape2) > m <- melt(L2) > m$Var.1 <- factor(as.character(m$Var.1)) # optional - if columns should be sorted > dcast(m, L1 ~ Var.1, fill = 0) L1 ! ? c e i k l t 1 greg 0 0 0 1 2 1 1 0 2 sam 1 0 1 1 2 0 0 1 3 teacher 0 1 1 0 1 1 1 0 >库（2） >m$Var.1 dcast（m，L1~Var.1，填充=0） L1？c e i k l t 1格雷格0 0 1 2 1 0 2 sam 1 0 1 1 2 0 0 1 3名教师011010 基本解决方案。这是一个相应的基本解决方案，其中前两行执行熔化，下一行确保列将被排序，最后一行从长到宽重塑形状： > m <- do.call(rbind, lapply(L2, as.data.frame)) > m$row <- sub("[.].*", "", rownames(m)) > m$Var1 <- factor(as.character(m$Var1)) > xtabs(Freq ~ row + Var1, m) Var1 row ! ? c e i k l t greg 0 0 0 1 2 1 1 0 sam 1 0 1 1 2 0 0 1 teacher 0 1 1 0 1 1 1 0 >m$row m$Var1 xtabs（频率~row+Var1，m） Var1 一行c e i k l t 格雷格0 0 1 2 1 0 山姆1012101 教师011010 编辑：添加了基本解决方案并修改了排序行。这里有一个非常简单的基本解决方案： # first determine all possible column names cols <- sort(unique(unlist(lapply(L2,names), use.names=FALSE))) # initialize the output out <- matrix(0, length(L2), length(cols), dimnames=list(names(L2),cols)) # loop over list and fill in the matrix for(i in seq_along(L2)) { out[names(L2)[i], names(L2[[i]])] <- L2[[i]] } #首先确定所有可能的列名 cols我想是这样的： names <- sort(unique(unlist(lapply(L2, names), use.names=FALSE))) L3 <- t(vapply(L2, function(x) x[names], FUN.VALUE=numeric(length(names)))) colnames(L3) <- names L3[is.na(L3)] <- 0 名称我曾想过熔化等等，但问题确实指定了一个基本的R解决方案……基本解决方案非常酷。我对R没什么问题，但我从来没有想过用你现有的方式使用这些函数。创意+1不错。这是我的选票。将Var1转换为factor是否有必要？不，没有必要将其转换回factor，事实上，我已经考虑了这两种方法，但我认为如果m在其他地方使用，不改变其类别就不那么令人惊讶了。我怀疑这一种可能在速度+1时表现得很好（将测试这些并稍后报告）@Tylerlinker:刚刚为所有基本解决方案（除了您的解决方案）添加了基准测试；-）怀疑得到证实。谢谢你花时间坐板凳。我在板凳上坐了下来。约书亚的绝妙方法比我所追求的更有效。再次感谢你。 f1 <- function(L2) { cols <- sort(unique(unlist(lapply(L2,names), use.names=FALSE))) out <- matrix(0, length(L2), length(cols), dimnames=list(names(L2),cols)) for(i in seq_along(L2)) out[names(L2)[i], names(L2[[i]])] <- L2[[i]] out } f2 <- function(L2) { L.names <- sort(unique(unlist(sapply(L2, names)))) L3 <- t(sapply(L2, function(x) x[L.names])) colnames(L3) <- L.names L3[is.na(L3)] <- 0 L3 } f3 <- function(L2) { m <- do.call(rbind, lapply(L2, as.data.frame)) m$row <- sub("[.].*", "", rownames(m)) m$Var1 <- factor(as.character(m$Var1)) xtabs(Freq ~ row + Var1, m) } library(rbenchmark) benchmark(f1(L2), f2(L2), f3(L2), order="relative")[,1:5] # test replications elapsed relative user.self # 1 f1(L2) 100 0.022 1.000 0.020 # 2 f2(L2) 100 0.051 2.318 0.052 # 3 f3(L2) 100 0.788 35.818 0.760 set.seed(21) L <- replicate(676, {n=sample(10,1); l=sample(26,n); setNames(sample(6,n,TRUE), letters[l])}, simplify=FALSE) names(L) <- levels(interaction(letters,LETTERS)) benchmark(f1(L), f2(L), order="relative")[,1:5] # test replications elapsed relative user.self # 1 f1(L) 100 1.84 1.000 1.828 # 2 f2(L) 100 4.24 2.304 4.220 names <- sort(unique(unlist(lapply(L2, names), use.names=FALSE))) L3 <- t(vapply(L2, function(x) x[names], FUN.VALUE=numeric(length(names)))) colnames(L3) <- names L3[is.na(L3)] <- 0