用r中的单词和分号拆分字符串
我有一个数据框(或数据表),在名为用r中的单词和分号拆分字符串,r,string,dataframe,parsing,data.table,R,String,Dataframe,Parsing,Data.table,我有一个数据框(或数据表),在名为text的列中有字符串,如下所示: text name: john; surname: smith; age: 35; gender:male name: mark; age:50 name: jack; surname: brown name: tom; surname: travis; gender: male 如何将字符串的每个部分extact到同一数据帧中的独立列中?我希望有以下专栏: name.text name: john name: mark n
text
的列中有字符串,如下所示:
text
name: john; surname: smith; age: 35; gender:male
name: mark; age:50
name: jack; surname: brown
name: tom; surname: travis; gender: male
如何将字符串的每个部分extact到同一数据帧中的独立列中?我希望有以下专栏:
name.text
name: john
name: mark
name: jack
name: tom
surname.text
surname: smith
(empty)
surname: brown
surname: travis
age.text
age: 35
age: 50
(empty)
(empty)
gender.text
gender:male
(empty)
(empty)
gender:male
谢谢大家! 这里有一个选项,我们在
处拆分元素
,然后将分为两列,并将格式从“长”改为“宽”
library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
df1 %>%
rownames_to_column('rn') %>%
separate_rows(text, sep = ';\\s*') %>%
separate(text, into = c('key', 'val'), sep=":\\s*") %>%
pivot_wider(names_from = key, values_from = val,
values_fill = list(val = "(empty)")) %>%
select(-rn) %>%
imap_dfr(~ case_when(.x != "(empty)" ~ str_c(.y, .x, sep=":"), TRUE ~ .x)) %>%
rename_all(~ str_c(., ".text"))
# A tibble: 4 x 4
# name.text surname.text age.text gender.text
# <chr> <chr> <chr> <chr>
#1 name:john surname:smith age:35 gender:male
#2 name:mark (empty) age:50 (empty)
#3 name:jack surname:brown (empty) (empty)
#4 name:tom surname:travis (empty) gender:male
数据
df1非常感谢您的回复。我需要使用几乎基本的R(因为其他项目依赖,不要问为什么:)。你能帮个忙吗?如何只用R基来解决这个问题?@Makaroni。在帖子中,您指定了data.table
(您使用的是该软件包)@Makaroni。更新为base R
nm1 <- c("name", "surname", "age", "gender")
lst1 <- lapply(strsplit(df1$text, ";\\s*"), function(x) {
prfx <- sub(":.*", "", x)
x1 <- x[match(nm1, prfx)]
replace(x1, is.na(x1), "(empty)")})
out <- do.call(rbind.data.frame, lst1)
names(out) <- paste0(nm1, ".text")
out
# name.text surname.text age.text gender.text
#1 name: john surname: smith age: 35 gender:male
#2 name: mark (empty) age:50 (empty)
#3 name: jack surname: brown (empty) (empty)
#4 name: tom surname: travis (empty) gender: male
df1 <- structure(list(text = c("name: john; surname: smith; age: 35; gender:male",
"name: mark; age:50", "name: jack; surname: brown",
"name: tom; surname: travis; gender: male"
)), class = "data.frame", row.names = c(NA, -4L))
d = lapply(strsplit(df1$text, "; ?"), function(x){
data.frame(do.call(rbind, strsplit(x, ": ?")), stringsAsFactors = FALSE)
})
fields = unique(unlist(lapply(d, function(x) x$X1)))
d2 = do.call(rbind, lapply(d, function(x)
data.frame(fields, val = x$X2[match(fields, x$X1)])))
d2[order(match(d2$fields, fields)),]