R 从字符串提取创建多个列
所以我有一个文本,我正试图从中提取。以下是我的文字:R 从字符串提取创建多个列,r,R,所以我有一个文本,我正试图从中提取。以下是我的文字: Charge: Larceny; Charge: Stealing a motor vehicle; 我正试图创造这个 Charge1 Charge2 Charge3 Larceny Stealing a motor vehicle NA 有什么想法吗?现在,我的代码如下所示: data$charge <- str_extract_all(data, "(?=
Charge: Larceny; Charge: Stealing a motor vehicle;
我正试图创造这个
Charge1 Charge2 Charge3
Larceny Stealing a motor vehicle NA
有什么想法吗?现在,我的代码如下所示:
data$charge <- str_extract_all(data, "(?=Charge:)(\\D){4,100}")
data$charge如果您的文本都是相同的格式,那么使用tidyverse将非常容易:
require(tidyverse)
df <- data.frame(text = c("Charge: Larceny; Charge: Stealing a motor vehicle;",
"Charge: some_charge; Charge: another_charge; Charge: something_else"))
df %>% separate(text, c("Charge1", "Charge2", "Charge3"), sep = "; Charge: ") %>%
mutate(Charge1 = gsub("Charge: ", "", Charge1))
require(tidyverse)
df%单独(文本,c(“Charge1”、“Charge2”、“Charge3”),sep=“;Charge:”)%>%
突变(Charge1=gsub(“Charge:,”,Charge1))
您可能需要清理一些挂起的分号,不过我们可以使用tidyverse
来执行此操作
library(tidyerse)
tibble(str1) %>%
separate_rows(str1, sep= ";\\s*") %>%
separate(str1, into = c("col1", "col2"), sep=":\\s*") %>%
mutate(col1 = na_if(col1, "")) %>%
fill(col1) %>%
mutate(col1 = paste0(col1, row_number())) %>%
spread(col1, col2)
# A tibble: 1 x 3
# Charge1 Charge2 Charge3
# <chr> <chr> <chr>
#1 Larceny Stealing a motor vehicle NA
库(Tidyese)
TIBLE(str1)%>%
单独的_行(str1,sep=“;\\s*”)%>%
分离(str1,插入=c(“col1”,“col2”),sep=“:\\s*”)%>%
突变(col1=na_,如果(col1,“”)%>%
填充(col1)%>%
突变(col1=paste0(col1,row_number())%>%
排列(第1列,第2列)
#一个tibble:1 x 3
#收费1收费2收费3
#
#1盗窃机动车罪
数据
str1使用base R:
read.table(text=gsub("\\s*Charge:\\s*","",strng),sep=";",fill=T,col.names = paste0("Charge",1:3))
Charge1 Charge2 Charge3
1 Larceny Stealing a motor vehicle NA
您也可以使用strcapture
。但不像gsub那样灵活:
strcapture(paste0(rep("\\s*Charge:\\s*([^;]+);",2),collapse=""),strng,data.frame(charge1=character(),charge2=character()))
charge1 charge2
1 Larceny Stealing a motor vehicle
稍微修改您的解决方案。注意?=
和之间的区别?这个答案可以帮助您尝试使用tidyverse
tibble(str1)%%>%单独的行(str1,sep=“;\\s*”)%%单独的(str1,into=c(“col1”,“col2”),sep=“:\\s*”%%突变(col1=na_if(col1,”)%%填充(col1)%%>%突变(col1=paste0(col1,row\u number())%%排列(col1,col2)
strcapture(paste0(rep("\\s*Charge:\\s*([^;]+);",2),collapse=""),strng,data.frame(charge1=character(),charge2=character()))
charge1 charge2
1 Larceny Stealing a motor vehicle
str_extract_all(data, "(?<=Charge: )[^;]+")
[[1]]
[1] "Larceny" "Stealing a motor vehicle"