R语言中基数排序的实现_R_Sorting_Vector_Radix Sort

R语言中基数排序的实现

r sorting vector

R语言中基数排序的实现,r,sorting,vector,radix-sort,R,Sorting,Vector,Radix Sort,如何在基本R（例如）中实现以下向量： vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22) 我知道data.table实现了开箱即用的基数排序，因此您可以使用该软件包，例如，只需设置键即可对数据进行排序： library(data.table) vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22) f1<-function(vec){ DT<-data.table(vec) setkey

如何在基本R（例如）中实现以下向量：

vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)

我知道

data.table

实现了开箱即用的基数排序，因此您可以使用该软件包，例如，只需设置键即可对数据进行排序：

library(data.table)

vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)

f1<-function(vec){
  DT<-data.table(vec)
setkey(DT, vec)
DT
}

f1(vec)

    vec
1:    6
2:   22
3:   25
4:   34
5:  478
6:  504
7: 9021
8: 9947

要比较速度：

microbenchmark(f1(vec), radix(vec))

Unit: microseconds
      expr    min     lq mean median     uq     max neval
   f1(vec)  290.6  314.8  335    327  349.1   524.1   100
radix(vec) 1062.8 1121.7 1458   1163 1250.5 24407.9   100

更大的速度比较：

set.seed(200)
more<-sample(10000,5000)
microbenchmark(f1(more), radix(more))

       expr     min      lq  mean  median      uq     max neval
   f1(more)   539.3   565.5   623   622.2   664.8   769.7   100
radix(more) 10457.8 10668.0 11683 11133.7 12298.3 25010.6   100

set.seed（200）
更多我知道data.table
实现了开箱即用的基数排序，因此您可以使用该软件包，例如，只需设置键即可对数据进行排序：
library(data.table)

vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)

f1<-function(vec){
  DT<-data.table(vec)
setkey(DT, vec)
DT
}

f1(vec)

    vec
1:    6
2:   22
3:   25
4:   34
5:  478
6:  504
7: 9021
8: 9947

要比较速度：
microbenchmark(f1(vec), radix(vec))

Unit: microseconds
      expr    min     lq mean median     uq     max neval
   f1(vec)  290.6  314.8  335    327  349.1   524.1   100
radix(vec) 1062.8 1121.7 1458   1163 1250.5 24407.9   100

更大的速度比较：
set.seed(200)
more<-sample(10000,5000)
microbenchmark(f1(more), radix(more))

       expr     min      lq  mean  median      uq     max neval
   f1(more)   539.3   565.5   623   622.2   664.8   769.7   100
radix(more) 10457.8 10668.0 11683 11133.7 12298.3 25010.6   100

set.seed（200）
更多以下是我自己的解决方案：
f_radixSort <- function(x){
    mx <- nchar(max(x))
    for (i in 1:mx)
        x <- x[order(x%%(10^i))]
    return(x)
}

和一个简短的基准测试（我没有包括使用数据的排序。表
，因为我不知道谁的原则是什么，而且，我问了一个以R为基数的答案）：
库（微基准）
vec以下是我自己的解决方案：
f_radixSort <- function(x){
    mx <- nchar(max(x))
    for (i in 1:mx)
        x <- x[order(x%%(10^i))]
    return(x)
}

和一个简短的基准测试（我没有包括使用数据的排序。表
，因为我不知道谁的原则是什么，而且，我问了一个以R为基数的答案）：
库（微基准）
vec我的解决方案看起来是这样的-请容忍我，我是初学者；-）
但结果是正确的：
radixSort <- function(sortvec) {
  mx <- nchar(max(sortvec))
  ## for all digits up to the number of digits in the longest number:  
  for (i in 1:mx){
    ## empty the 10 buckets
    bucket <- list()
    ## for all 10 buckets:
    for (bucketnumber in 1:10){
      ## fill each bucket with the appropriate numbers
      bucket[[bucketnumber]] <- sortvec[dig(sortvec, i)==(bucketnumber-1)]
    }
    ## empty the sorted vector
    sortvec <- c()
    ## fill the sorted vector with the the contents of buckets 1-10
    for (k in 1:10){
      sortvec <- c(sortvec, bucket[[k]])
    }
  }
  return(sortvec)
}

dig <- function(x, st) {
  ## returns the value of digit #st in number x, e.g. dig(3456, 2) returns 5
  remainder <- x%%(10^st)
  divisor <- 10^(st-1)
  return(trunc(remainder/divisor))
}

radixSort我的解决方案看起来像这样-请耐心听我说，我是初学者；-）
但结果是正确的：
radixSort <- function(sortvec) {
  mx <- nchar(max(sortvec))
  ## for all digits up to the number of digits in the longest number:  
  for (i in 1:mx){
    ## empty the 10 buckets
    bucket <- list()
    ## for all 10 buckets:
    for (bucketnumber in 1:10){
      ## fill each bucket with the appropriate numbers
      bucket[[bucketnumber]] <- sortvec[dig(sortvec, i)==(bucketnumber-1)]
    }
    ## empty the sorted vector
    sortvec <- c()
    ## fill the sorted vector with the the contents of buckets 1-10
    for (k in 1:10){
      sortvec <- c(sortvec, bucket[[k]])
    }
  }
  return(sortvec)
}

dig <- function(x, st) {
  ## returns the value of digit #st in number x, e.g. dig(3456, 2) returns 5
  remainder <- x%%(10^st)
  divisor <- 10^(st-1)
  return(trunc(remainder/divisor))
}

radixSort感谢您的解决方案。我还是投了赞成票。但是f1
和radix
函数是否都遵循基数排序的原则对数据进行排序？为了验证，是否可以打印排序的每个步骤？我认为该函数按最低有效位进行基数排序，即从1s开始，然后是10s，等等。您可以在每个循环的末尾添加一个打印调用，以观察它的工作情况。它将从右向左排序。函数中的
循环不需要第一个，因为您可以将
数字
以矢量化方式转换为数据帧或矩阵。将算法的速度放在一边，它所需的空间也相当大，这是一个矩阵，其中nrow=vec的长度
和ncol=vec中最大数的长度
，感谢您的解决方案。我还是投了赞成票。但是f1
和radix
函数是否都遵循基数排序的原则对数据进行排序？为了验证，是否可以打印排序的每个步骤？我认为该函数按最低有效位进行基数排序，即从1s开始，然后是10s，等等。您可以在每个循环的末尾添加一个打印调用，以观察它的工作情况。它将从右向左排序。函数中的
循环不需要第一个，因为您可以将
数字
以矢量化方式转换为数据帧或矩阵。将算法的速度放在一边，它需要的空间也是相当大的，这是一个矩阵，其中nrow=vec的长度和ncol=vec中最大数的长度非常好的解决方案，但是当您调用order
时，这仍然算作真正的基数排序吗？据我所知，基数并没有按照传统方法排序。它将每个数字分解为“bucket”，在这些bucket中保留它们的原始顺序，然后按正确的顺序组合这些bucket。因此，我的a循环实现实际上，我认为它确实可以工作<代码>顺序
不会对任何内容进行排序。只需给出顺序……但是，只需跳转到循环的最后一次迭代即可得到相同的结果：all（vec[order（vec%%10^nchar（max（vec））]==f_radixSort（vec））
@BryanGoggin甚至vec[order（vec）]
只需给出排序的数字，但它不遵守算法的原理。非常好的解决方案，但是，当您调用order
时，这仍然算是真正的基数排序吗？据我所知，基数并没有按照传统方法排序。它将每个数字分解为“bucket”，在这些bucket中保留它们的原始顺序，然后按正确的顺序组合这些bucket。因此，我的a循环实现实际上，我认为它确实可以工作<代码>顺序
不会对任何内容进行排序。只需给出顺序……但是，只需跳转到循环的最后一次迭代即可得到相同的结果：all（vec[order（vec%%10^nchar（max（vec））]==f_radixSort（vec））
@BryanGoggin甚至vec[order（vec）]
只需给出排序的数字，但它并不遵循算法的原理。除非你特别是，为了在R中实现基数排序，基数排序（来自“data.table”）被合并到R3.3.0中--sort（vec，method=“radix”）
@alexis_laz是的，我对实现该算法特别好奇。它的原理在算法上很漂亮。作为一种简单、更计数的排序样式，请参见rep（seq_len（max（vec））、tablate（vec））
（对于大整数，它将需要大量内存），这，基本上只是将整数放入桶中并选择非零elements@alexis_laz这也是一个不错的选择，但对于零来说失败了。你说得对，是的。有了零，你可以使用一个便宜的技巧，比如+/-1（rep（seq_len（max（vec）+1），tablate（vec+1））-1
），尽管可能会有更多的缺陷（除了内存），除非你特别想在R中实现基数排序，否则基数排序（来自“data.table”）被合并到R3.3.0中--sort（vec，method=“radix”）
@alexis_laz是的，我对实现算法特别好奇。它的原理在算法上很漂亮。作为一种简单、更计数的排序样式，请参见rep（seq_len（max（vec））、tablate（vec））
（对于大整数，它将需要大量内存），这，基本上只是将整数放入桶中并选择非零elements@alexis_laz这也是一个不错的选择，但对于零来说失败了。你说得对，是的。对于零，你可以使用一个便宜的技巧，比如+/-1（rep（seq_len）（max（vec）+1）