Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String 在R中生成子字符串和随机字符串_String_R_List_Text_Random - Fatal编程技术网

String 在R中生成子字符串和随机字符串

String 在R中生成子字符串和随机字符串,string,r,list,text,random,String,R,List,Text,Random,请容忍我,我来自Python背景,我还在学习R中的字符串操作 好,假设我有一个长度为100的字符串,带有随机的a、B、C或D字母: > df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD") > df [1]"ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDAC

请容忍我,我来自Python背景,我还在学习R中的字符串操作

好,假设我有一个长度为100的字符串,带有随机的a、B、C或D字母:

> df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
> df
[1]"ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD"
。。。诸如此类

2) 以生成的列表为例,其中包含另一个具有相同子字符串的列表,唯一的区别是将a、B、C或Ds中的一个或两个更改为另一个a、B、C或D(仅限这四个字母中的任何一个)

那么这个,

NAME1
ABCBDBDBCBABABDBCBCB
将变成这样:

NAME1.1
ABBBDBDBCBDBABDBCBCB
如您所见,第三个位置的“C”变成了“B”,第11个位置的“a”变成了“D”,这些更改的字母之间没有隐含关系。纯粹随机的

我知道这是一个复杂的问题,但正如我所说,我仍在学习R中的基本文本和字符串操作


提前谢谢

一种方式,尽管很慢:

Rgames> foo<-paste(sample(c('a','b','c','d'),20,rep=T),sep='',collapse='')
Rgames> bar<-matrix(unlist(strsplit(foo,'')),ncol=5)
Rgames> bar
     [,1] [,2] [,3] [,4] [,5]
[1,] "c"  "c"  "a"  "c"  "a" 
[2,] "c"  "c"  "b"  "a"  "b" 
[3,] "b"  "b"  "a"  "c"  "d" 
[4,] "c"  "b"  "a"  "c"  "c"

然后,如有必要,使用粘贴重新组合每一行。我尝试将其分解为多个简单步骤,希望您能从中学习一些技巧:

# Random data
df<-c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
n<-10 # Number of cuts
set.seed(1)
# Pick n random numbers between 1 and the length of string-20
nums<-sample(1:(nchar(df)-20),n,replace=TRUE)
# Make your cuts
cuts<-sapply(nums,function(x) substring(df,x,x+20-1))
# Generate some names
nams<-paste0('NAME',1:n)
# Make it into a matrix, transpose, and then recast into a vector to get alternating names and cuts.
names.and.cuts<-c(t(matrix(c(nams,cuts),ncol=2)))
# Drop a file.
write.table(names.and.cuts,'file.txt',quote=FALSE,row.names=FALSE,col.names = FALSE)

# Pick how many changes are going to be made to each cut.
changes<-sample(1:2,n,replace=2)
# Pick that number of positions to change
pos.changes<-lapply(changes,function(x) sample(1:20,x))
# Find the letter at each position.
letter.at.change.pos<-lapply(pos.changes,function(x) substring(df,x,x))
# Make a function that takes any letter, and outputs any other letter from c(A-D)                             
letter.map<-function(x){
    # Make a list of alternate letters.
    alternates<-lapply(x,setdiff,x=c('A','B','C','D'))
    # Pick one of each
    sapply(alternates,sample,size=1)
}
# Find another letter for each
letter.changes<-lapply(letter.at.change.pos,letter.map)
# Make a function to replace character by position
# Inefficient, but who cares.
rep.by.char<-function(str,pos,chars){
  for (i in 1:length(pos)) substr(str,pos[i],pos[i])<-chars[i]
  str
}

# Change every letter at pos.changes to letter.changes
mod.cuts<-mapply(rep.by.char,cuts,pos.changes,letter.changes,USE.NAMES=FALSE)
# Generate names
nams<-paste0(nams,'.1')
# Use the matrix trick to alternate names.Drop a file.
names.and.mod.cuts<-c(t(matrix(c(nams,mod.cuts),ncol=2)))
write.table(names.and.mod.cuts,'file2.txt',quote=FALSE,row.names=FALSE,col.names = FALSE)
  • 创建子字符串的文本文件

    n <- 20 # length of substrings
    
    starts <- seq(nchar(df) - 20 + 1)
    
    v1 <- mapply(substr, starts, starts + n - 1, MoreArgs = list(x = df))
    
    names(v1) <- paste0("NAME", seq_along(v1), "\n")
    
    write.table(v1, file = "filename.txt", quote = FALSE, sep = "",
                col.names = FALSE)
    

    n关于问题的第一部分:

    df <- c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
    
    nstrchars <- 20
    count<- nchar(df)-nstrchars
    
    length20substrings <- data.frame(length20substrings=sapply(1:count,function(x)substr(df,x,x+20)))
    
    # to save to a text file.  I chose not to include row names or a column name in the .txt file file
    write.table(length20substrings,"length20substrings.txt",row.names=F,col.names=F)
    

    df是否可以将一个或两个字母替换为相同的字母,即是否允许将
    A
    替换为
    A
    ?真的是随机的吗?很好。我应该想到
    strsplit
    replace
    组合。我想你不能保证被替换的信会有所不同
    A
    可以替换为
    A
    @nograps不幸的是,OP没有回答我的评论。从问题中:这些更改的字母之间没有隐含的关系。纯粹是随机的。非常感谢!像你这样的用户正是Stack对学习至关重要的原因。我喜欢@SvenHohenstein使用
    strsplit
    replace
    ,所以我在这里展示了你如何做到这一点。这段代码中有很多非常棒的地方。谢谢你的发帖!
    mod.cuts<-mapply(function(x,y,z) paste(replace(x,y,z),collapse=''),
       strsplit(cuts,''),pos.changes,letter.changes,USE.NAMES=FALSE)
    
    n <- 20 # length of substrings
    
    starts <- seq(nchar(df) - 20 + 1)
    
    v1 <- mapply(substr, starts, starts + n - 1, MoreArgs = list(x = df))
    
    names(v1) <- paste0("NAME", seq_along(v1), "\n")
    
    write.table(v1, file = "filename.txt", quote = FALSE, sep = "",
                col.names = FALSE)
    
    myfun <- function() {
      idx <- sample(seq(n), sample(1:2, 1))
      rep <- sample(LETTERS[1:4], length(idx), replace = TRUE)
      return(list(idx = idx, rep = rep))
    }
    
    new <- replicate(length(v1), myfun(), simplify = FALSE)
    
    v2 <- mapply(function(x, y, z) paste(replace(x, y, z), collapse = ""),  
                 strsplit(v1, ""),
                 lapply(new, "[[", "idx"),
                 lapply(new, "[[", "rep"))
    
    names(v2) <- paste0(names(v2), ".1")
    
    write.table(v2, file = "filename2.txt", quote = FALSE, sep = "\n", 
                col.names = FALSE)
    
    df <- c("ABCBDBDBCBABABDBCBCBDBDBCBDBACDBCCADCDBCDACDDCDACBCDACABACDACABBBCCCBDBDDCACDDACADDDDACCADACBCBDCACD")
    
    nstrchars <- 20
    count<- nchar(df)-nstrchars
    
    length20substrings <- data.frame(length20substrings=sapply(1:count,function(x)substr(df,x,x+20)))
    
    # to save to a text file.  I chose not to include row names or a column name in the .txt file file
    write.table(length20substrings,"length20substrings.txt",row.names=F,col.names=F)
    
    # create a function that will randomly pick one or two spots in a string and replace
    # those spots with one of the other characters present in the string:
    
    changefxn<- function(x){
     x<-as.character(x)
     nc<-nchar(as.character(x))
     id<-seq(1,nc)
     numchanges<-sample(1:2,1)
     ids<-sample(id,numchanges) 
     chars2repl<-strsplit(x,"")[[1]][ids]
     charspresent<-unique(unlist(strsplit(x,"")))
     splitstr<-unlist(strsplit(x,""))
     if (numchanges>1) {
     splitstr[id[1]] <- sample(setdiff(charspresent,chars2repl[1]),1)
     splitstr[id[2]] <- sample(setdiff(charspresent,chars2repl[2]),1)
     }
     else {splitstr[id[1]] <- sample(setdiff(charspresent,chars2repl[1]),1)
     }
     newstr<-paste(splitstr,collapse="")
     return(newstr)
    }
    
    # try it out
    
    changefxn("asbbad")
    changefxn("12lkjaf38gs")
    
    # apply changefxn to all the substrings from part 1
    
    length20substrings<-length20substrings[seq_along(length20substrings[,1]),]
    newstrings <- lapply(length20substrings, function(ii)changefxn(ii))