Algorithm 迭代列表以最大化唯一输出的最佳方式
我有一个列表,其中的内容是字符向量。例如:Algorithm 迭代列表以最大化唯一输出的最佳方式,algorithm,r,list,optimization,graph-theory,Algorithm,R,List,Optimization,Graph Theory,我有一个列表,其中的内容是字符向量。例如: yoda <- list(a=list(c("A","B","C"), c("B","C","D")), b=list(c("D","C"), c("B","C","D","E","F"))) 目前,我刚刚得到了大量的嵌套循环,而且它似乎永远不会消失。下面是一段糟糕的代码: res <- list() for (a in 1:length(extra.pats[[1]])) { for (b in 1:length(extra.pat
yoda <- list(a=list(c("A","B","C"), c("B","C","D")), b=list(c("D","C"), c("B","C","D","E","F")))
目前,我刚刚得到了大量的嵌套循环,而且它似乎永远不会消失。下面是一段糟糕的代码:
res <- list()
for (a in 1:length(extra.pats[[1]])) {
for (b in 1:length(extra.pats[[2]])) {
for (c in 1:length(extra.pats[[3]])) {
for (d in 1:length(extra.pats[[4]])) {
for (e in 1:length(extra.pats[[5]])) {
for (f in 1:length(extra.pats[[6]])) {
for (g in 1:length(extra.pats[[7]])) {
for (h in 1:length(extra.pats[[8]])) {
for (i in 1:length(extra.pats[[9]])) {
for (j in 1:length(extra.pats[[10]])) {
for (k in 1:length(extra.pats[[11]])) {
res[[paste(a,b,c,d,e,f,g,h,i,j,k, sep="_")]] <- unique(extra.pats[[1]][[a]], extra.pats[[2]][[b]], extra.pats[[3]][[c]], extra.pats[[4]][[d]], extra.pats[[5]][[e]], extra.pats[[6]][[f]], extra.pats[[7]][[g]], extra.pats[[8]][[h]], extra.pats[[9]][[i]], extra.pats[[10]][[j]], extra.pats[[11]][[k]])
}
}
}
}
}
}
}
}
}
}
}
res这里有一个建议:
# create all possible combinations
comb <- expand.grid(yoda)
# find unique values for each combination
uni <- lapply(seq(nrow(comb)), function(x) unique(unlist(comb[x, ])))
# count the unique values
len <- lapply(uni, length)
# extract longest combination
uni[which.max(len)]
[[1]]
[1] "A" "B" "C" "D" "E" "F"
#创建所有可能的组合
梳您当前的问题维度禁止彻底搜索。下面是一个次优算法的示例。虽然很简单,但也许你会发现它给了你“足够好”的结果
算法如下:
查看第一个列表:选择唯一值最多的项
查看第二个列表:选择除了在步骤1中已选择的项之外,还包含最多新唯一值的项
重复上述步骤,直到完成列表的末尾
守则:
good.cover <- function(top.list) {
selection <- vector("list", length(top.list))
num.new.unique <- function(x, y) length(setdiff(y, x))
for (i in seq_along(top.list)) {
score <- sapply(top.list[[i]], num.new.unique, x = unlist(selection))
selection[[i]] <- top.list[[i]][which.max(score)]
}
selection
}
good.cover我的猜测是,如果你做一点研究,这是图论中一个众所周知的问题(我会使用“coverage”这样的关键字来搜索它)。这很可能是一个NP完全问题。意思:要么你的问题维度小到可以进行穷举搜索,要么你必须依赖次优算法来获得“足够好”的解决方案。如果你碰巧走上了“穷举搜索”路线,而你可能能够在某种程度上调整这种天真的for-loop方法,你可能最终会想研究类似于Rcpp的东西。你在全球范围内有多少唯一的值(例如,有多少是length(unique)(unlist(lappy)(extra.pats,function(x)unique)(unlist(x kеее)
)?@flodel这是真的。对于此解决方案,实际列表可能太大。我已经尝试了此方法,但由于大小原因,expand.grid无法完成!
good.cover <- function(top.list) {
selection <- vector("list", length(top.list))
num.new.unique <- function(x, y) length(setdiff(y, x))
for (i in seq_along(top.list)) {
score <- sapply(top.list[[i]], num.new.unique, x = unlist(selection))
selection[[i]] <- top.list[[i]][which.max(score)]
}
selection
}
items.universe <- apply(expand.grid(list(LETTERS, 0:9)), 1, paste, collapse = "")
random.length <- function()sample(3:6, 1)
random.sample <- function(i)sample(items.universe, random.length())
random.list <- function(i)lapply(letters[1:12], random.sample)
initial.list <- lapply(1:11, random.list)
system.time(final.list <- good.cover(initial.list))
# user system elapsed
# 0.004 0.000 0.004