R 如何生成长串数字?
我想生成一些带有大量数字的数字字符串,在本例中是合成数据集中的ID值 对于简短的数字字符串,我将使用R 如何生成长串数字?,r,string,R,String,我想生成一些带有大量数字的数字字符串,在本例中是合成数据集中的ID值 对于简短的数字字符串,我将使用sample: sprintf("%05.f", sample(0:(1e5-1), 18)) ## [1] "54783" "80354" "53607" "99668" "63621" "07121" "15944" "27436" "96837" ## [10] "28751" "95315" "63326" "00981" "15300" "18448" "09885" "63360"
sample
:
sprintf("%05.f", sample(0:(1e5-1), 18))
## [1] "54783" "80354" "53607" "99668" "63621" "07121" "15944" "27436" "96837"
## [10] "28751" "95315" "63326" "00981" "15300" "18448" "09885" "63360" "04539"
这不适用于较长的字符串。首先,内存需求变得太大,然后你不能使数字足够大。例如,这不起作用:
sprintf("%020.f", sample(0:(1e20-1), 18))
## Error in 0:(1e+20 - 1) : result would be too long a vector
如何生成包含大量数字的数字字符串?生成单个数字,将它们打包在单个数字之间,然后将这些数字折叠在一起
library(magrittr)
generateNumberStrings <- function(nNumbers, nCharsPerNumber)
{
sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
split(gl(nNumbers, nCharsPerNumber)) %>%
vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}
generateNumberStrings(18, 20)
## [1] "06985095513359117867" "95278964413245221928" "75398392571928201881"
## [4] "00722065797044523279" "24475619649735183646" "29165493966488037145"
## [7] "34289922968745727406" "82354362380114534171" "84293845597888728670"
## [10] "97570546918892201649" "41421884356741221760" "99306177663904189401"
## [13] "25668966612346726451" "94949806854834288664" "43664073601604613019"
## [16] "25848242347176214032" "80736828777283687373" "83763855757083999312"
库(magrittr)
生成枚举字符串%
拆分(总帐(nNumbers,nCharsPerNumber))%>%
vapply(粘贴0,字符(1),折叠=”,USE.NAMES=FALSE)
}
生成枚举字符串(18,20)
## [1] "06985095513359117867" "95278964413245221928" "75398392571928201881"
## [4] "00722065797044523279" "24475619649735183646" "29165493966488037145"
## [7] "34289922968745727406" "82354362380114534171" "84293845597888728670"
## [10] "97570546918892201649" "41421884356741221760" "99306177663904189401"
## [13] "25668966612346726451" "94949806854834288664" "43664073601604613019"
## [16] "25848242347176214032" "80736828777283687373" "83763855757083999312"
您可以使用stringi
软件包:
require(stringi)
stri_rand_strings(10,50,pattern="[0-9]")
#[1] "33163217620361477538822791082750025522246331345665"
#[2] "85105858270154002408385176647161448078668054193081"
#[3] "62417899981033664011261714060242781925235001978704"
#[4] "17731152361720663463691231461493607438220463345863"
#[5] "06316044683426574113640145569673845269595104465896"
#[6] "17058300286927387520323781399768150137786864069558"
#[7] "86204984977415277470013113957915963393339586096213"
#[8] "56382530391794208466245591896055134584746907393458"
#[9] "61740570216902905237145952608961548203505061535222"
#[10] "28713530448562268345804947527043822080897315821103"
第一个参数是结果向量的长度,第二个参数是每个字符串的字符数,第三个参数表示只需要数字
坚持使用base
R,可以尝试生成1000个字符串,每个字符串包含50个数字:
apply(matrix(sample(charToRaw("0123456789"),50*1000,replace=TRUE),nrow=1000),1,rawToChar)
一个基本的R替代方案:
set.seed(123)
paste0(sample(0:9,50,replace=TRUE),collapse="")
#[1] "27489058549465182039866967552199670472321443112428"
编辑:正如@docendodiscimus所建议的,这可以与replicate()
组合以获得任意数量的此类字符串:
replicate(10,paste0(sample(0:9,50,replace=TRUE),collapse=""))
# [1] "27489058549465182039866967552199670472321443112428" "04715217836032848874767042363126471498811636317045"
# [3] "53494896419309715954633239101668675687943401822027" "84321352425363357242618766358583725425992396944615"
# [5] "29654832114226073489297603456964502318185616373997" "22525714489869553305800177940671320302062108789107"
# [7] "70776410443470388238821710903962783466694152439326" "19516964381183371044438459723957375912029277122119"
# [9] "91953470363824219340565386331895392614012571877136" "53202887119441522628084764602728369116489047092067"
以及强制性比赛:
GNS <- function(nNumbers, nCharsPerNumber)
{
sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
split(gl(nNumbers, nCharsPerNumber)) %>%
vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}
GNP <- function(nNumbers,nCharsPerNumber){
replicate(nNumbers,paste0(sample(0:9,nCharsPerNumber,replace=TRUE),collapse=""))
}
GST <- function(nNumbers,nCharsPerNumber){
stri_rand_strings(nNumbers,nCharsPerNumber,pattern="[0-9]")
}
microbenchmark(GNS(1000,100),GNP(1000,100),GST(1000,100),10)
我们有一个明显的赢家
编辑:添加另一个基本选项,速度更快
GSAP<- function(nNumbers,nCharsPerNumber){
apply( matrix(sample(charToRaw("0123456789"),nNumbers*nCharsPerNumber,replace=TRUE),nrow=nCharsPerNumber),1, rawToChar ) }
Unit: microseconds
expr min lq mean median uq max
GSAP(1000, 100) 724.584 739.637 821.435 766.8345 899.06 1030.086
GNS(1000, 100) 36189.180 38316.406 39739.471 39141.5695 39965.02 44478.450
GNP(1000, 100) 35777.282 36331.839 38448.665 38575.8945 39725.21 43016.281
GST(1000, 100) 1863.803 1898.013 1944.472 1918.7110 1975.33 2122.094
因此,GST以微弱优势获胜 如果你想坚持使用
base
R,一个有趣的选择可能是:apply(矩阵(sample(charToRaw(“0123456789”)、50*1000,replace=TRUE)、nrow=1000、1、rawToChar)
应该更快(生成1000个字符串,每个字符串有50个数字)。比较好。你也可以添加我的基本R解决方案吗?我在回答中做了一个编辑。谢谢,但我想在GSAP
中的参数是相反的。它应该是nrow=nNumbers
GSAP(1000100)
生成100个字符串,每个字符串包含1000个字符。更正后,猜测前两个位置将被翻转。@nicola请看我的第三次试镜。输入是“方形”的,所以它都是copastic。您还有一个输入错误(nCharsPerNumber
作为参数,并且nCharsPerNumber
在正文中)。这段时间不可能是真的。我猜你缓存了一个非常小的NcharsPerNumber
。@nicola,对不起。我将发布一个更正的分数,我应该在这里询问,然后再推出我自己的解决方案。:)
GSAP<- function(nNumbers,nCharsPerNumber){
apply( matrix(sample(charToRaw("0123456789"),nNumbers*nCharsPerNumber,replace=TRUE),nrow=nCharsPerNumber),1, rawToChar ) }
Unit: microseconds
expr min lq mean median uq max
GSAP(1000, 100) 724.584 739.637 821.435 766.8345 899.06 1030.086
GNS(1000, 100) 36189.180 38316.406 39739.471 39141.5695 39965.02 44478.450
GNP(1000, 100) 35777.282 36331.839 38448.665 38575.8945 39725.21 43016.281
GST(1000, 100) 1863.803 1898.013 1944.472 1918.7110 1975.33 2122.094
expr min lq mean median uq max neval
GSAP(x, y) 3.906626 3.975160 4.069103 4.049784 4.163262 4.329284 10
GNS(x, y) 33.645200 33.972587 34.513555 34.406009 35.141313 35.328662 10
GNP(x, y) 30.833180 31.136971 33.037422 32.193070 33.010896 41.713811 10
GST(x, y) 1.697303 1.706599 1.731205 1.735127 1.756961 1.763861 10