R 基于列表结构模式创建新列表_R_Stringi

R 基于列表结构模式创建新列表

R 基于列表结构模式创建新列表,r,stringi,R,Stringi,我有一些数据如下所示： dat <- c("Sales","Jim","Halpert","","", "Reception","Pam","Beasley","","", "Not.Manager","Dwi

我有一些数据如下所示：

   dat <- c("Sales","Jim","Halpert","","",
            "Reception","Pam","Beasley","","",
            "Not.Manager","Dwight","Schrute","Bears","Beets","BattlestarGalactica","","",
            "Manager","Michael","Scott","","")

iwant <- c(
           c("Sales","Jim","Halpert"),
           c("Reception","Pam","Beasley"),
           c("Not.Manager","Dwight","Schrute","Bears","Beets","BattlestarGalactica"),
           c("Manager","Michael","Scott")
           )

dat我建议下一种方法。您将得到一个数据帧，其中的变量格式与您想要的格式类似：
#Split chains
L1 <- strsplit(paste0(dat,collapse = " "),split = "  ")
#Split vectors from each chain
L2 <- lapply(L1[[1]],function(x) strsplit(trimws(x),split = " "))
#Format
L2 <- lapply(L2,as.data.frame)
#Remove zero dim data
L2[which(lapply(L2,nrow)==0)]<-NULL
#Format names
L2 <- lapply(L2,function(x) {names(x)<-'v';return(x)})
#Transform to dataframe
D1 <- as.data.frame(do.call(cbind,L2))
#Rename
names(D1) <- paste0('V',1:dim(D1)[2])
#Remove recycled values
D1 <- as.data.frame(apply(D1,2,function(x) {x[duplicated(x)]<-NA;return(x)}))

#拆分链
L1我建议下一种方法。您将得到一个数据帧，其中的变量格式与您想要的格式类似：
#Split chains
L1 <- strsplit(paste0(dat,collapse = " "),split = "  ")
#Split vectors from each chain
L2 <- lapply(L1[[1]],function(x) strsplit(trimws(x),split = " "))
#Format
L2 <- lapply(L2,as.data.frame)
#Remove zero dim data
L2[which(lapply(L2,nrow)==0)]<-NULL
#Format names
L2 <- lapply(L2,function(x) {names(x)<-'v';return(x)})
#Transform to dataframe
D1 <- as.data.frame(do.call(cbind,L2))
#Rename
names(D1) <- paste0('V',1:dim(D1)[2])
#Remove recycled values
D1 <- as.data.frame(apply(D1,2,function(x) {x[duplicated(x)]<-NA;return(x)}))

#拆分链
L1您可以使用rle
，split
和lapply
：
lapply(split(dat, with(rle(dat != ''), 
             rep(cumsum(values), lengths))), function(x) x[x!= ''])

#$`1`
#[1] "Sales"   "Jim"     "Halpert"

#$`2`
#[1] "Reception" "Pam"       "Beasley"  

#$`3`
#[1] "Not.Manager"         "Dwight"    "Schrute"     "Bears"   "Beets"            
#[6] "BattlestarGalactica"

#$`4`
#[1] "Manager" "Michael" "Scott"  

rle
零件在以下位置创建要拆分的组：
with(rle(dat != ''), rep(cumsum(values), lengths))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4

在split
之后，我们使用lappy
从每个列表中删除任何空元素。
您可以使用rle
，split
和lappy
：
lapply(split(dat, with(rle(dat != ''), 
             rep(cumsum(values), lengths))), function(x) x[x!= ''])

#$`1`
#[1] "Sales"   "Jim"     "Halpert"

#$`2`
#[1] "Reception" "Pam"       "Beasley"  

#$`3`
#[1] "Not.Manager"         "Dwight"    "Schrute"     "Bears"   "Beets"            
#[6] "BattlestarGalactica"

#$`4`
#[1] "Manager" "Michael" "Scott"  

rle
零件在以下位置创建要拆分的组：
with(rle(dat != ''), rep(cumsum(values), lengths))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4

在split
之后，我们使用lappy
从每个列表中删除任何空元素。
这两个dat
和iwant
都是一个向量。这两个dat
和iwant
都是一个向量。您的答案对字符串中的空格和其他非字母数字字符非常可靠。这是我不知道我在找的东西！您的答案对字符串中的空格和其他非字母数字字符非常可靠。这是我不知道我在找的东西！我得到了D1我得到了D1