R中for循环的性能优化_R_For Loop_Apply

R中for循环的性能优化

r for-loop

R中for循环的性能优化,r,for-loop,apply,R,For Loop,Apply,我有一个字符向量，希望为每对向量值创建一个带有距离度量的矩阵（使用stringdist包）。目前，我有一个嵌套for循环的实现： library(stringdist) strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic") m <- matrix(nrow = length(strings), ncol = length(strings)) colnames

我有一个字符向量，希望为每对向量值创建一个带有距离度量的矩阵（使用

stringdist

包）。目前，我有一个嵌套for循环的实现：

library(stringdist)

strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic")
m <- matrix(nrow = length(strings), ncol = length(strings))
colnames(m) <- strings
rownames(m) <- strings

for (i in 1:nrow(m)) {
  for (j in 1:ncol(m)) {
    m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv")
  }
}

但是，如果我有一个长度为1000的向量，它有许多非唯一的值，那么这个矩阵相当大（比如说，800行乘以800列），循环速度非常慢。我喜欢优化性能，例如使用

apply

函数，但我不知道如何将上述代码转换为

apply

语法。有人能帮忙吗？

当使用嵌套循环时，检查

outer（）

是否不适合您的工作总是很有趣的

outer（）

是嵌套循环的矢量化解决方案；它对前两个参数中元素的每个可能组合应用一个矢量化的函数。由于
stringdist（）
作用于向量，您只需执行以下操作：

library(stringdist) strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic") outer(strings,strings, function(i,j){ stringdist(tolower(i),tolower(j)) })

库（stringdist）字符串使用嵌套循环时，检查outer（）是否不适合您的工作总是很有趣的outer（）是嵌套循环的矢量化解决方案；它对前两个参数中元素的每个可能组合应用一个矢量化的函数。由于stringdist（）作用于向量，您只需执行以下操作： library(stringdist) strings <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic") outer(strings,strings, function(i,j){ stringdist(tolower(i),tolower(j)) }) 库（stringdist）字符串这里有一个简单的开始：矩阵是对称的，因此不需要计算对角线下方的条目m[j][i]=m[i][j] 。很明显，对角线元素都是零，所以不需要麻烦了像这样： for (i in 1:nrow(m)) { m[i][i] <- 0 for (j in (i+1):ncol(m)) { m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv") m[j,i] <- m[i,j] } } for（1中的i:nrow（m））{ m[i][i]这里有一个简单的开始：矩阵是对称的，因此不需要计算对角线下方的条目。m[j][i]=m[i][j] 。很明显，对角线元素都是零，因此不需要麻烦这些像这样： for (i in 1:nrow(m)) { m[i][i] <- 0 for (j in (i+1):ncol(m)) { m[i,j] <- stringdist::stringdist(tolower(rownames(m)[i]), tolower(colnames(m)[j]), method = "lv") m[j,i] <- m[i,j] } } for（1中的i:nrow（m））{ m[i][i]Bioconductor有一个stringDist 功能，可以为您实现这一点： source("http://bioconductor.org/biocLite.R") biocLite("Biostrings") library(Biostrings) stringDist(c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic"), upper=TRUE) ## 1 2 3 4 5 6 7 8 9 ## 1 1 3 4 5 4 4 6 7 ## 2 1 2 4 4 3 3 6 7 ## 3 3 2 3 3 4 3 5 7 ## 4 4 4 3 2 5 4 5 7 ## 5 5 4 3 2 3 3 5 7 ## 6 4 3 4 5 3 3 5 7 ## 7 4 3 3 4 3 3 6 8 ## 8 6 6 5 5 5 5 6 2 ## 9 7 7 7 7 7 7 8 2 Bioconductor有一个stringDist 功能，可以为您实现这一点： source("http://bioconductor.org/biocLite.R") biocLite("Biostrings") library(Biostrings) stringDist(c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic"), upper=TRUE) ## 1 2 3 4 5 6 7 8 9 ## 1 1 3 4 5 4 4 6 7 ## 2 1 2 4 4 3 3 6 7 ## 3 3 2 3 3 4 3 5 7 ## 4 4 4 3 2 5 4 5 7 ## 5 5 4 3 2 3 3 5 7 ## 6 4 3 4 5 3 3 5 7 ## 7 4 3 3 4 3 3 6 8 ## 8 6 6 5 5 5 5 6 2 ## 9 7 7 7 7 7 7 8 2 由于@hrbrmstr的提示，我发现stringdist 包本身提供了一个名为stringdistmatrix 的函数，它满足了我的要求（请参阅）函数调用很简单：stringdistmatrix（strings，strings）感谢@hrbrmstr的提示，我发现stringdist 包本身提供了一个名为stringdistmatrix 的函数，它满足了我的要求（请参阅）函数调用很简单：stringdistmatrix（strings，strings） apply 也是循环，不一定比for循环快。另请参阅，代码优化问题应该在CodeReview上而不是StackOverflow上提出。应用也是循环，不一定比for循环快。另请参阅代码优化问题应该在CodeReview上而不是StackO上提出verflow非常感谢并为我感到羞耻：stringdist 软件包也有这样一个功能：stringdistmatrix 你可以/应该把它作为一个答案发布出来，然后拒绝接受我的答案并接受它（要点！）。这些天我脑子里一直在想着“bioconductor”（为infosec构建类似的东西）对于这个答案来说，这简直是太过分了。非常感谢，我也感到羞耻：stringdist 包也有这样一个功能：stringdistmatrix 你可以/应该把它作为一个答案发布出来，取消接受我的答案并接受它（要点！）。这些天我脑子里一直在想着“bioconductor”（为infosec构建类似的东西）对于答案来说，这太过分了。以前不知道outer 函数，但这也会起作用！以前不知道outer 函数，但这也会起作用！