使用R将列拆分为多个字段

使用R将列拆分为多个字段,r,dplyr,text-mining,stringr,text-analysis,R,Dplyr,Text Mining,Stringr,Text Analysis,我的csv中有一个列,其中有一个字段“features”。这些字段具有此格式的数据 {""Air conditioning"",""Elevator"",""Smoke detector""} {""Air conditioning"",""Railing Lights"",""Smoke detector""} {""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""} 它们是20000条记录,在字段“features”中包含

我的csv中有一个列,其中有一个字段“features”。这些字段具有此格式的数据

{""Air conditioning"",""Elevator"",""Smoke detector""}
{""Air conditioning"",""Railing Lights"",""Smoke detector""}
{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}
它们是20000条记录,在字段“features”中包含这些字符串,这些字符串没有任何特定顺序

我怎样才能把它们分成不同的列,所有的“空调”都归入第一列,“电梯”归入第二列,依此类推

          a          b       c              d            
air conditioning elevators smokedetectors 
air conditioning elevators smokedetectors washer
air conditioning elevators smokedetectors washer

separate
tidyr
mutate\u从
dplyr
组合起来(插入一个
gsub
):


请注意,合并了额外的字段(如第三条记录所示),有关更多选项,请参阅
?separate

如前所述,您可以查看“splitstackshape”包,特别是
cSplit\e
函数。使用它,您可以尝试:

library(splitstackshape)
cSplit_e(as.data.table(dfr)[, features := (gsub("[{}\"]", "", features))], 
         "features", ",", mode = "value", type = "character", drop = TRUE)
##    features_Air conditioning features_Dryer features_Elevator features_Railing Lights features_Smoke detector features_Washer
## 1:          Air conditioning             NA          Elevator                      NA          Smoke detector              NA
## 2:          Air conditioning             NA                NA          Railing Lights          Smoke detector              NA
## 3:          Air conditioning          Dryer                NA                      NA          Smoke detector          Washer
其中,@Remko的回答中定义了“dfr”:

dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}',
                               '{""Air conditioning"",""Railing Lights"",""Smoke detector""}',
                               '{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}'))

dfr check
?cSplit
来自
splitstackshape
软件包。您可以使用
read.csv(text=gsub(“[{}]”),“”,txt),header=FALSE,quote=“”)
其中
txt
是上面作为一条字符串的文本谢谢。若你们注意到在你们的输出栏中,B列第一个是电梯,第三个是洗衣机。怎样才能把所有的洗衣机放在一根柱子下面,把所有的电梯放在另一根柱子下面。你最初的问题并没有真正表明这一点!我认为我们必须重新考虑解决方案。
library(splitstackshape)
cSplit_e(as.data.table(dfr)[, features := (gsub("[{}\"]", "", features))], 
         "features", ",", mode = "value", type = "character", drop = TRUE)
##    features_Air conditioning features_Dryer features_Elevator features_Railing Lights features_Smoke detector features_Washer
## 1:          Air conditioning             NA          Elevator                      NA          Smoke detector              NA
## 2:          Air conditioning             NA                NA          Railing Lights          Smoke detector              NA
## 3:          Air conditioning          Dryer                NA                      NA          Smoke detector          Washer
dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}',
                               '{""Air conditioning"",""Railing Lights"",""Smoke detector""}',
                               '{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}'))