r-查找最大长度“;连锁店;数对的数字递增
我有一个数字对的两列数据帧:r-查找最大长度“;连锁店;数对的数字递增,r,R,我有一个数字对的两列数据帧: ODD <- c(1,1,1,3,3,3,5,7,7,9,9) EVEN <- c(10,8,2,2,6,4,2,6,8,4,8) dfPairs <- data.frame(ODD, EVEN) > dfPairs ODD EVEN 1 1 10 2 1 8 3 1 2 4 3 2 5 3 6 6 3 4 7 5 2 8 7 6 9
ODD <- c(1,1,1,3,3,3,5,7,7,9,9)
EVEN <- c(10,8,2,2,6,4,2,6,8,4,8)
dfPairs <- data.frame(ODD, EVEN)
> dfPairs
ODD EVEN
1 1 10
2 1 8
3 1 2
4 3 2
5 3 6
6 3 4
7 5 2
8 7 6
9 7 8
10 9 4
11 9 8
ODD编辑以处理不从1开始并返回最大链而不是链长度的df
使用igraph
您的数据,dfPairs
ODD <- c(1,1,1,3,3,3,5,7,7,9,9)
EVEN <- c(10,8,2,2,6,4,2,6,8,4,8)
dfPairs <- data.frame(ODD, EVEN)
制作数据图表。我的解决方案的关键是将数据帧反向(rev(dfPairs)
)绑定到原始数据帧。这将允许构建从奇数到偶数的方向边。图可以很容易地用来构造定向路径
library(igraph)
library(dplyr)
GPairs <- graph_from_data_frame(dplyr::arrange(rbind(setNames(dfPairs, c("X1", "X2")), setNames(rev(dfPairs), c("X1", "X2"))), X1))
GTest <- graph_from_data_frame(dplyr::arrange(rbind(setNames(dfTest, c("X1", "X2")), setNames(rev(dfTest), c("X1", "X2"))), X1))
我创建了一个函数,用于1)将所有简单路径转换为数值向量列表,2)仅过滤满足左->右递增的元素的每个数值向量,以及3)返回左->右递增数值向量的最大链
max_chain_only_increasing <- function(gpath) {
list_vec <- lapply(gpath, function(v) as.numeric(names(unclass(v)))) # convert to list of numeric vector
only_increasing <- lapply(list_vec, function(v) v[1:min(which(v >= dplyr::lead(v, default=tail(v, 1))))]) # subset vector for only elements that are left->right increasing
return(unique(only_increasing[lengths(only_increasing) == max(lengths(only_increasing))])) # return maximum chain length
}
现在,我将从dfPairs
中的每个唯一元素开始输出最大链的(标题),即原始数据
start_vals <- sort(unique(unlist(dfPairs)))
# [1] 1 2 3 4 5 6 7 8 9 10
max_chains <- sapply(seq_len(length(start_vals)), function(i) max_chain_only_increasing(all_simple_paths(GPairs, i)))
names(max_chains) <- start_vals
# $`1`
# [1] 1 2 3 6 7 8 9
# $`2`
# [1] 2 3 6 7 8 9
# $`3`
# [1] 3 6 7 8 9
# $`4`
# [1] 4 9
# $`5`
# [1] 5
# etc
尽管Cpak做出了努力,我最终还是编写了自己的函数来解决这个问题。本质上,我意识到我可以通过使用Cpak答案中的这段代码从左到右创建从右到左的链链接:
output <- arrange(rbind(setNames(dfPairs, c("X1", "X2")), setNames(rev(dfPairs), c("X1", "X2")))`, X1)
总的来说,我对此感到高兴。我想它可能更优雅一点,但它适用于任何东西,它适用于一些相当庞大和复杂的数据。这将从700对数据集中产生约241700个解决方案
我还使用了在stackoverflow上找到的moveme函数(见下文)。我使用它来移动NA值,以实现join_shift函数的移位方面
moveme <- function (invec, movecommand) {
movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]],
",|\\s+"), function(x) x[x != ""])
movelist <- lapply(movecommand, function(x) {
Where <- x[which(x %in% c("before", "after", "first",
"last")):length(x)]
ToMove <- setdiff(x, Where)
list(ToMove, Where)
})
myVec <- invec
for (i in seq_along(movelist)) {
temp <- setdiff(myVec, movelist[[i]][[1]])
A <- movelist[[i]][[2]][1]
if (A %in% c("before", "after")) {
ba <- movelist[[i]][[2]][2]
if (A == "before") {
after <- match(ba, temp) - 1
}
else if (A == "after") {
after <- match(ba, temp)
}
}
else if (A == "first") {
after <- 0
}
else if (A == "last") {
after <- length(myVec)
}
myVec <- append(temp, values = movelist[[i]][[1]], after = after)
}
myVec
}
moveme不会将“链”1->8(第2行)移动9@42-,从技术上说是的,但这不是问题所在。我希望在一个链中尽可能多的链接(即成对链接)。正如我其余的评论所暗示的,很难确定问题是什么。@42-,很抱歉。我在寻找由最多链接数表示的最长链。不是由最长链接表示的最长链。包含最长链链接的最终数据帧存储在哪里?我在max_chainlength_only_递增函数中注释掉了以“return”开头的行,该函数随后返回包含名为max_chainlength的链接的向量列表。我可以用这个。明亮的非常感谢。请更正我最后的评论。它实际上是一个列表。第一个列表是起点列表。每个起始点都有一个可以从该点创建的可能向量列表。另一个非常相关的点是,max_chainlength_only_递增函数仅在从1开始时起作用。如果删除示例数据中的前三对,将得到“无效起始向量”错误。确保你相应地修改了你的数据。尽管我努力了…卢兹
start_vals <- sort(unique(unlist(dfPairs)))
# [1] 1 2 3 4 5 6 7 8 9 10
max_chains <- sapply(seq_len(length(start_vals)), function(i) max_chain_only_increasing(all_simple_paths(GPairs, i)))
names(max_chains) <- start_vals
# $`1`
# [1] 1 2 3 6 7 8 9
# $`2`
# [1] 2 3 6 7 8 9
# $`3`
# [1] 3 6 7 8 9
# $`4`
# [1] 4 9
# $`5`
# [1] 5
# etc
start_vals <- sort(unique(unlist(dfTest)))
max_chains <- sapply(seq_len(length(start_vals)), function(i) max_chain_only_increasing(all_simple_paths(GTest, i)))
names(max_chains) <- start_vals
# $`2`
# [1] 2 3 6 7 8 9
# $`3`
# [1] 3 6 7 8 9
# $`4`
# [1] 4 9
# $`5`
# [1] 5
# $`6`
# [1] 6 7 8 9
output <- arrange(rbind(setNames(dfPairs, c("X1", "X2")), setNames(rev(dfPairs), c("X1", "X2")))`, X1)
output$increase <- with(output, ifelse(X2>X1, "Greater", "Less"))
output <- filter(output, increase == "Greater")
output <- select(output, -increase)
out_split <- split(output, output$X1)
df_final <- Reduce(join_shift, out_split)
join_shift <- function(dtf1,dtf2){
abcd <- full_join(dtf1, dtf2, setNames(colnames(dtf2)[1], colnames(dtf1)[ncol(dtf1)]))
abcd[is.na(abcd)]<-0
colnames(abcd)[ncol(abcd)] <- "end"
# print(abcd)
abcd_na <- filter(abcd, end==0)
# print(abcd_na)
abcd <- filter(abcd, end != 0)
abcd_na <- abcd_na[moveme(names(abcd_na), "end first")]
# print(abcd_na)
names(abcd_na) <- names(abcd)
abcd<- rbind(abcd, abcd_na)
z <- length(colnames(abcd))
colnames(abcd)<- c(paste0("X", 1:z))
# print(abcd)
return(abcd)
}
df_final_trim = df_final[,colSums(df_final) > 0]
moveme <- function (invec, movecommand) {
movecommand <- lapply(strsplit(strsplit(movecommand, ";")[[1]],
",|\\s+"), function(x) x[x != ""])
movelist <- lapply(movecommand, function(x) {
Where <- x[which(x %in% c("before", "after", "first",
"last")):length(x)]
ToMove <- setdiff(x, Where)
list(ToMove, Where)
})
myVec <- invec
for (i in seq_along(movelist)) {
temp <- setdiff(myVec, movelist[[i]][[1]])
A <- movelist[[i]][[2]][1]
if (A %in% c("before", "after")) {
ba <- movelist[[i]][[2]][2]
if (A == "before") {
after <- match(ba, temp) - 1
}
else if (A == "after") {
after <- match(ba, temp)
}
}
else if (A == "first") {
after <- 0
}
else if (A == "last") {
after <- length(myVec)
}
myVec <- append(temp, values = movelist[[i]][[1]], after = after)
}
myVec
}