在R中拆分字符串并逐列重新配置
我在R中有一个复杂的字符串拆分问题。在我的数据帧中,我有一个列,其中包含不同长度的字符串在R中拆分字符串并逐列重新配置,r,string,R,String,我在R中有一个复杂的字符串拆分问题。在我的数据帧中,我有一个列,其中包含不同长度的字符串 Site Class A1 D2.13 A2 E1.4 A3 FA.1 A4 H2.14 A5 F AR G1 现在我想添加新的列来逐个字符地重新组合字符串,而点应该逐个字符地“忽略” Site Class1 Class2 Class3 Class4 A1 D D2 D2.1
Site Class
A1 D2.13
A2 E1.4
A3 FA.1
A4 H2.14
A5 F
AR G1
现在我想添加新的列来逐个字符地重新组合字符串,而点应该逐个字符地“忽略”
Site Class1 Class2 Class3 Class4
A1 D D2 D2.1 D2.13
A2 E E1 E1.4 NA
A3 F FA FA.1 NA
A4 H H2 H2.1 H2.14
A5 F NA NA NA
AR G G1 NA NA
测试数据:
structure(list(Site = c("A1", "A2", "A3", "A4", "A5", "AR"),
Class = c("D2.13", "E1.4", "FA.1", "H2.14", "F","G1")),
class = "data.frame", row.names = c(NA, -6L))
轻松使用dplyr
df%>%rowwise()%>%mutate(Class1=substr(Class,1,1),
Class2=ifelse(nchar(strsplit(Class,"\\.")[[1]][1])==2,substr(Class,1,2),NA),
Class3=ifelse(nchar(strsplit(Class,"\\.")[[1]][2])>0,substr(Class,1,4),NA),
Class4=ifelse(nchar(Class)>4,Class,NA)
)
Source: local data frame [6 x 6]
Groups: <by row>
# A tibble: 6 x 6
Site Class Class1 Class2 Class3 Class4
<chr> <chr> <chr> <chr> <chr> <chr>
1 A1 D2.13 D D2 D2.1 D2.13
2 A2 E1.4 E E1 E1.4 NA
3 A3 FA.1 F FA FA.1 NA
4 A4 H2.14 H H2 H2.1 H2.14
5 A5 F F NA NA NA
6 AR G1 G G1 NA NA
df%>%rowwise()%%>%变异(Class1=substr(类,1,1),
Class2=ifelse(nchar(strsplit(Class,“\\”)[[1]][1])==2,substr(Class,1,2),NA),
Class3=ifelse(nchar(strsplit(类“\\”[[1]][2])>0,substr(类,1,4),NA),
类别4=ifelse(nchar(类别)>4,类别,NA)
)
来源:本地数据帧[6 x 6]
组:
#一个tibble:6x6
场地类别1类别2类别3类别4
1 A1 D2.13 D D2.1 D2.13
2 A2 E1.4 E E1.4 NA
3 A3 FA.1 F FA.1 NA
4 A4 H2.14 H H2.1 H2.14
5 A5 F不适用不适用不适用不适用
6 AR G1 G G1 NA NA
一种方法是将类按每个字符拆分,然后使用Reduce
和acculate=TRUE
将它们逐个粘贴在一起。然后,我们将它们的长度设置为最大长度,rbind
和cbind
返回到原始数据帧,即
l1 <- lapply(strsplit(as.character(df$Class), ''), function(i){i1 <- Reduce(paste0, i, accumulate = TRUE);
i1 <- i1[!grepl('\\.$', i1)];
i1})
final_list <- lapply(l1, `length<-`, max(lengths(l1)))
cbind.data.frame(df$Site, do.call(rbind, final_list))
l1与@Sotos的想法类似(关键部分是Reduce
和strsplit
),但配置有所不同:
library(data.table)
df <- setDT(df)[, .(Class = Reduce(paste0, unlist(strsplit(as.character(Class), split = "")), accumulate = T)),
by = Site][
!grepl("\\.$", Class)][, nr := paste0("Class", rleid(Class)), by = Site]
dcast(df, Site ~ nr, value.var = "Class")
库(data.table)
df
library(data.table)
df <- setDT(df)[, .(Class = Reduce(paste0, unlist(strsplit(as.character(Class), split = "")), accumulate = T)),
by = Site][
!grepl("\\.$", Class)][, nr := paste0("Class", rleid(Class)), by = Site]
dcast(df, Site ~ nr, value.var = "Class")
Site Class1 Class2 Class3 Class4
1: A1 D D2 D2.1 D2.13
2: A2 E E1 E1.4 <NA>
3: A3 F FA FA.1 <NA>
4: A4 H H2 H2.1 H2.14
5: A5 F <NA> <NA> <NA>
6: AR G G1 <NA> <NA>