R 长格式分离问题
从该数据帧:R 长格式分离问题,r,R,从该数据帧: dftest <- data.frame(id = c(1), text = c("java-ee?jsf?omnifaces?jpa"), stringsAsFactors = F) 我使用以下命令使其: s2 <- strsplit(dftest$text, split = "?") dftest2 <- data.frame(id = rep(dftest2$id, sapply(s2, length)), text = unlist(s2)) dfl
dftest <- data.frame(id = c(1), text = c("java-ee?jsf?omnifaces?jpa"), stringsAsFactors = F)
我使用以下命令使其:
s2 <- strsplit(dftest$text, split = "?")
dftest2 <- data.frame(id = rep(dftest2$id, sapply(s2, length)), text = unlist(s2))
dflike_final <- reshape(dftest2, idvar = "id", timevar = "text", direction = "wide")
如何修复它以获得整个字符串?我们可以将
文本
放在单独的行中,创建一个虚拟列(n
),并使用pivot\u wide
以宽格式获取数据
library(dplyr)
library(tidyr)
dftest %>%
separate_rows(text, sep = "\\?") %>%
mutate(n = 1) %>%
pivot_wider(values_from = n, names_from = text)
# A tibble: 1 x 5
# id `java-ee` jsf omnifaces jpa
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 1 1
?
是正则表达式中的特殊符号。您需要转义它或使用strsplit(dftest$text,split=“?”,fixed=TRUE)。
id text
1 1 j
2 1 a
3 1 v
4 1 a
5 1 -
6 1 e
7 1 e
8 1 ?
9 1 j
10 1 s
11 1 f
12 1 ?
13 1 o
14 1 m
15 1 n
16 1 i
17 1 f
18 1 a
19 1 c
20 1 e
21 1 s
22 1 ?
23 1 j
24 1 p
25 1 a
library(dplyr)
library(tidyr)
dftest %>%
separate_rows(text, sep = "\\?") %>%
mutate(n = 1) %>%
pivot_wider(values_from = n, names_from = text)
# A tibble: 1 x 5
# id `java-ee` jsf omnifaces jpa
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 1 1
s2 <- strsplit(dftest$text, split = "\\?")
dftest2 <- data.frame(id = rep(dftest$id, lengths(s2)), text = unlist(s2), n = 1)
dflike_final <- reshape(dftest2, idvar = "id", timevar = "text", direction = "wide")