R 从字符串提取创建多个列

R 从字符串提取创建多个列,r,R,所以我有一个文本,我正试图从中提取。以下是我的文字: Charge: Larceny; Charge: Stealing a motor vehicle; 我正试图创造这个 Charge1 Charge2 Charge3 Larceny Stealing a motor vehicle NA 有什么想法吗?现在,我的代码如下所示: data$charge <- str_extract_all(data, "(?=

所以我有一个文本,我正试图从中提取。以下是我的文字:

Charge: Larceny; Charge: Stealing a motor vehicle;
我正试图创造这个

Charge1     Charge2                        Charge3
Larceny     Stealing a motor vehicle       NA
有什么想法吗?现在,我的代码如下所示:

data$charge <- str_extract_all(data, "(?=Charge:)(\\D){4,100}")

data$charge如果您的文本都是相同的格式,那么使用tidyverse将非常容易:

require(tidyverse)
df <- data.frame(text = c("Charge: Larceny; Charge: Stealing a motor vehicle;", 
                       "Charge: some_charge; Charge: another_charge; Charge: something_else"))

df %>% separate(text, c("Charge1", "Charge2", "Charge3"), sep = "; Charge: ") %>%
        mutate(Charge1 = gsub("Charge: ", "", Charge1))
require(tidyverse)
df%单独(文本,c(“Charge1”、“Charge2”、“Charge3”),sep=“;Charge:”)%>%
突变(Charge1=gsub(“Charge:,”,Charge1))

您可能需要清理一些挂起的分号,不过我们可以使用
tidyverse
来执行此操作

library(tidyerse)
tibble(str1) %>%
     separate_rows(str1, sep= ";\\s*") %>%
     separate(str1, into = c("col1", "col2"), sep=":\\s*") %>% 
     mutate(col1 = na_if(col1, "")) %>% 
     fill(col1) %>%
     mutate(col1 = paste0(col1, row_number())) %>%
     spread(col1, col2)
# A tibble: 1 x 3
# Charge1 Charge2                  Charge3
#  <chr>   <chr>                    <chr>  
#1 Larceny Stealing a motor vehicle NA     
库(Tidyese)
TIBLE(str1)%>%
单独的_行(str1,sep=“;\\s*”)%>%
分离(str1,插入=c(“col1”,“col2”),sep=“:\\s*”)%>%
突变(col1=na_,如果(col1,“”)%>%
填充(col1)%>%
突变(col1=paste0(col1,row_number())%>%
排列(第1列,第2列)
#一个tibble:1 x 3
#收费1收费2收费3
#                           
#1盗窃机动车罪
数据
str1使用base R:

read.table(text=gsub("\\s*Charge:\\s*","",strng),sep=";",fill=T,col.names = paste0("Charge",1:3))

  Charge1                  Charge2 Charge3
1 Larceny Stealing a motor vehicle      NA
您也可以使用
strcapture
。但不像gsub那样灵活:

 strcapture(paste0(rep("\\s*Charge:\\s*([^;]+);",2),collapse=""),strng,data.frame(charge1=character(),charge2=character()))
  charge1                  charge2
1 Larceny Stealing a motor vehicle

稍微修改您的解决方案。注意
?=
之间的区别?这个答案可以帮助您尝试使用
tidyverse
tibble(str1)%%>%单独的行(str1,sep=“;\\s*”)%%单独的(str1,into=c(“col1”,“col2”),sep=“:\\s*”%%突变(col1=na_if(col1,”)%%填充(col1)%%>%突变(col1=paste0(col1,row\u number())%%排列(col1,col2)
 strcapture(paste0(rep("\\s*Charge:\\s*([^;]+);",2),collapse=""),strng,data.frame(charge1=character(),charge2=character()))
  charge1                  charge2
1 Larceny Stealing a motor vehicle
str_extract_all(data, "(?<=Charge: )[^;]+")
[[1]]
[1] "Larceny"                  "Stealing a motor vehicle"