在R中使用for循环时避免重复_R_For Loop_Duplicates

在R中使用for循环时避免重复

r for-loop

在R中使用for循环时避免重复,r,for-loop,duplicates,R,For Loop,Duplicates,我有一个基本数据框，在一列中包含4个字母： a b c d 我希望使用嵌套的for循环将每个字母与新数据帧中的每个其他字母绑定，但避免将一个字母绑定到自身并避免重复。到目前为止，我可以避免前者，但后者有麻烦。我的代码如下所示： d <- data.frame(c("a", "b", "c", "d")) e <- data.frame() for (j in d[,1]) { for (i

我有一个基本数据框，在一列中包含4个字母：

a
b
c
d

我希望使用嵌套的for循环将每个字母与新数据帧中的每个其他字母绑定，但避免将一个字母绑定到自身并避免重复。到目前为止，我可以避免前者，但后者有麻烦。我的代码如下所示：

d <- data.frame(c("a", "b", "c", "d"))
e <- data.frame()

for (j in d[,1]) {
  
  for (i in d[,1]) {
    
    if (j != i) {
      e <- rbind(e, c(j, i))
    }
  }
}

我希望使用嵌套for循环生成：

a b
a c
a d
b c
b d
c d

我知道通过每次向下移动一行（在数据帧d中）来运行for循环可能会起作用，但我不确定如何编写代码。谢谢你的建议

这是一种

combn

的情况，无需循环即可轻松完成

t(combn(d[[1]], 2))

-输出

#     [,1] [,2]
#[1,] "a"  "b" 
#[2,] "a"  "c" 
#[3,] "a"  "d" 
#[4,] "b"  "c" 
#[5,] "b"  "d" 
#[6,] "c"  "d"

e[-1,]
col1 col2
2    a    b
3    a    c
4    a    d
5    b    c
6    c    d
7    d    b

如果OP想要使用循环，请添加一些条件

e <- data.frame(col1 = "", col2 = "")

for (j in d[,1]) {  
  for (i in d[,1]) {    
    if (j != i) {
       
         i1 <- !(any((i == e[[1]] & j == e[[2]])))
         
         i2 <- !(any((j %in% e[[1]] && i %in% e[[2]])))
         
         if(i1 & i2) {
          
         e <- rbind(e, c(j, i))
         
    }
  }
}
}

这是

combn

的一种情况，无需循环即可轻松完成

t(combn(d[[1]], 2))

-输出

#     [,1] [,2]
#[1,] "a"  "b" 
#[2,] "a"  "c" 
#[3,] "a"  "d" 
#[4,] "b"  "c" 
#[5,] "b"  "d" 
#[6,] "c"  "d"

e[-1,]
col1 col2
2    a    b
3    a    c
4    a    d
5    b    c
6    c    d
7    d    b

如果OP想要使用循环，请添加一些条件

e <- data.frame(col1 = "", col2 = "")

for (j in d[,1]) {  
  for (i in d[,1]) {    
    if (j != i) {
       
         i1 <- !(any((i == e[[1]] & j == e[[2]])))
         
         i2 <- !(any((j %in% e[[1]] && i %in% e[[2]])))
         
         if(i1 & i2) {
          
         e <- rbind(e, c(j, i))
         
    }
  }
}
}

同意@akrun的建议。根据经验，在R中几乎不需要使用循环来处理任何类型的字符串（或者通常是任何类型的）数据

请参阅此速度比较：

d <- data.frame(c(letters))
e <- data.frame()

solutionCustom <- function(x){
  for (j in d[,1]) {
    for (i in d[,1]) {
      if (j != i) {
        e <- rbind(e, c(j, i))
      }
    }
  }
  e
}

solutionCombn <- function(x) t(combn(d[,1], 2))

library(microbenchmark)

microbenchmark(solutionCustom=solutionCustom(),
               solutionCombn=solutionCombn())

Unit: microseconds
           expr       min        lq       mean     median         uq       max neval
 solutionCustom 44769.620 48898.410 54423.5789 54018.3875 57949.8755 76922.178   100
  solutionCombn   238.311   267.486   294.4763   286.2005   305.8805   605.728   100

d同意@akrun的建议。
根据经验，在R中几乎不需要使用循环来处理任何类型的字符串（或者通常是任何类型的）数据
请参阅此速度比较：
d <- data.frame(c(letters))
e <- data.frame()

solutionCustom <- function(x){
  for (j in d[,1]) {
    for (i in d[,1]) {
      if (j != i) {
        e <- rbind(e, c(j, i))
      }
    }
  }
  e
}

solutionCombn <- function(x) t(combn(d[,1], 2))

library(microbenchmark)

microbenchmark(solutionCustom=solutionCustom(),
               solutionCombn=solutionCombn())

Unit: microseconds
           expr       min        lq       mean     median         uq       max neval
 solutionCustom 44769.620 48898.410 54423.5789 54018.3875 57949.8755 76922.178   100
  solutionCombn   238.311   267.486   294.4763   286.2005   305.8805   605.728   100

d我很感谢你的回答，但是我正在使用这个嵌套for循环作为一个更大数据集的代理进行统计分析。@MatinBozorg你真的想更改你的原始脚本，还是你可以使用duplicated
e.e[！duplicated（pmin（e[[1]]，e[[2]]），pmax（e[[1]]，e[[2]]），]
我可以更改原始脚本，只要它可以与嵌套的for循环一起工作。@LyricalStats9我可以说，从性能角度看，复制的脚本会更大faster@LyricalStats9您可以使用检查更新，以进行循环。如果有效，请检查我感谢您的回答akrun，但我正在使用此嵌套for循环作为一个更大数据集的代理进行统计分析。@MatinBozorg您真的想更改原始脚本，还是可以使用duplicated
即e[！duplicated（pmin（e[[1]]，e[[2]]），pmax（e[[1]]，e[[2]]），]
我可以更改原始脚本，只要它可以与嵌套的for循环一起工作。@LyricalStats9我可以说，从性能角度看，复制的脚本会更大faster@LyricalStats9您可以使用检查更新，以进行循环。如果它工作，请检查未来的考虑，当编码组合/排列，你也可能想考虑非常好的组合和排列功能在GTooS包：未来的考虑时，编码组合/排列，您可能还想考虑GTooS包中非常好的组合和置换函数：