Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于字符在R中拆分字符串_R_String_Split - Fatal编程技术网

基于字符在R中拆分字符串

基于字符在R中拆分字符串,r,string,split,R,String,Split,我的数据集中有一列,其中有一个字符串要拆分 df = data.frame(col = c("BrBkRY","BBkRBr","YBRG","RBBk")) 这是我想要用来有条件分割的向量 sep = c("Br","Bk","R","Y","B","G") 这就是它最终的样子。我是手工做的 df2 = data.frame(col = c("BrBkRY","BBkRBr","YBRG","RBBk"), col1 = c("Br","B","Y","

我的数据集中有一列,其中有一个字符串要拆分

df = data.frame(col = c("BrBkRY","BBkRBr","YBRG","RBBk"))
这是我想要用来有条件分割的向量

sep = c("Br","Bk","R","Y","B","G")
这就是它最终的样子。我是手工做的

df2 = data.frame(col = c("BrBkRY","BBkRBr","YBRG","RBBk"), 
                 col1 = c("Br","B","Y","R"),
                 col2 = c("Bk","Bk","B","B"),
                 col3 = c("R","R","R","Bk"),
                 col4 = c("Y","Br","G",""))
df2 
     col col1 col2 col3 col4
1 BrBkRY   Br   Bk    R    Y
2 BBkRBr    B   Bk    R   Br
3   YBRG    Y    B    R    G
4   RBBk    R    B   Bk     

我在考虑使用正则表达式,但通常需要一个拆分字符,如
-
。但对于基于字符的字符串,我不知道。此外,我不想将BkB拆分为B、k和B。但我确实想将其拆分为Bk和B。是否有一个包可以这样做

我们可以使用
str\u extract\u all
提取
列表中的组件,然后
rbind
填充NA后的
列表
元素,使
列表
元素的
长度
相同,并且
cbind
与原始数据集相同

library(stringr)
lst <- str_extract_all(df$col, paste(sep, collapse="|"))
dfN <- cbind(df[1], do.call(rbind, lapply(lst, `length<-`, max(lengths(lst)))))
colnames(dfN)[-1] <- paste0("col", colnames(dfN)[-1])
dfN
#     col col1 col2 col3 col4
#1 BrBkRY   Br   Bk    R    Y
#2 BBkRBr    B   Bk    R   Br
#3   YBRG    Y    B    R    G
#4   RBBk    R    B   Bk <NA>

您可以使用lookahead和lookahead对正则表达式进行拆分。这个表达式表示在任何字符和国会大厦字母之间的空格上拆分<代码>(?)?
cbind(df[1], read.csv(text=sub("^,", "", gsub(paste0("(?=(",
    paste(sep, collapse="|"), "))"), ",", df$col, perl = TRUE)),  
     header=FALSE, col.names = paste0("col", 1:4), fill = TRUE))
#     col col1 col2 col3 col4
#1 BrBkRY   Br   Bk    R    Y
#2 BBkRBr    B   Bk    R   Br
#3   YBRG    Y    B    R    G
#4   RBBk    R    B   Bk     
> lst <- strsplit(as.character(df$col), '(?<=.)(?=[A-Z])', perl=TRUE)
> lst
[[1]]
[1] "Br" "Bk" "R"  "Y" 

[[2]]
[1] "B"  "Bk" "R"  "Br"

[[3]]
[1] "Y" "B" "R" "G"

[[4]]
[1] "R"  "B"  "Bk"
dfN <- cbind(df[1], do.call(rbind, lapply(lst, `length<-`, max(lengths(lst)))))
colnames(dfN)[-1] <- paste0("col", colnames(dfN)[-1])