如何将一串不同长度的数字和字母分隔成R中的不同列?

如何将一串不同长度的数字和字母分隔成R中的不同列?,r,string,R,String,我有一个名为“WFBS”的列,其中包含超过一百万行不同长度的字符串,如下所示: WFBS <- c("M010203", "S01020304", "N104509") WFBS这可能是一个有用的起点: library(tidyr) df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"), stringsAsFactors = FALSE) > df %>% separ

我有一个名为“WFBS”的列,其中包含超过一百万行不同长度的字符串,如下所示:

WFBS <- c("M010203", "S01020304", "N104509")


WFBS这可能是一个有用的起点:

library(tidyr)
df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"),
                 stringsAsFactors = FALSE)
> df %>% separate(col = WFBS,
                  into = c("WFBS1","WFBS2","WFBS3","WFBS4"),
                  sep = c(3,5,7))
  WFBS1 WFBS2 WFBS3 WFBS4
1   M01    02    03      
2   S01    02    03    04
3   N10    45    09      
library(tidyr)
df%>%单独(col=WFBS,
into=c(“WFBS1”、“WFBS2”、“WFBS3”、“WFBS4”),
sep=c(3,5,7))
WFBS1 WFBS2 WFBS3 WFBS4
1 M01 02 03
2 S01 02 03 04
3 N10 45 09

这将使您在剩余的点中保留空字符串,而不是NAs,您必须对其进行转换。

带有
基本R
的选项,但创建分隔符
使用
,使用
读取.csv
读取以创建4列数据.frame

read.csv(text = sub("^(...)(..)(..)(.*)", "\\1,\\2,\\3,\\4", WFBS), 
  header = FALSE, colClasses = rep("character", 4), na.strings = "",
        col.names =paste0("WFBS", 1:4), stringsAsFactors = FALSE)
#    WFBS1 WFBS2 WFBS3 WFBS4
#1   M01    02    03  <NA>
#2   S01    02    03    04
#3   N10    45    09  <NA>
read.csv(text=sub(“^(…)(…)(…)(…)(*”),“\\1,\\2,\\3,\\4”,WFBS),
header=FALSE,colClasses=rep(“字符”,4),na.strings=“”,
col.names=paste0(“WFBS”,1:4),stringsAsFactors=FALSE)
#WFBS1 WFBS2 WFBS3 WFBS4
#1 M01 02 03
#2 S01 02 03 04
#3 N10 45 09
library(tidyr)
df <- data.frame(WFBS = c("M010203", "S01020304", "N104509"),
                 stringsAsFactors = FALSE)
> df %>% separate(col = WFBS,
                  into = c("WFBS1","WFBS2","WFBS3","WFBS4"),
                  sep = c(3,5,7))
  WFBS1 WFBS2 WFBS3 WFBS4
1   M01    02    03      
2   S01    02    03    04
3   N10    45    09      
read.csv(text = sub("^(...)(..)(..)(.*)", "\\1,\\2,\\3,\\4", WFBS), 
  header = FALSE, colClasses = rep("character", 4), na.strings = "",
        col.names =paste0("WFBS", 1:4), stringsAsFactors = FALSE)
#    WFBS1 WFBS2 WFBS3 WFBS4
#1   M01    02    03  <NA>
#2   S01    02    03    04
#3   N10    45    09  <NA>