使用R将列拆分为多个字段
我的csv中有一个列,其中有一个字段“features”。这些字段具有此格式的数据使用R将列拆分为多个字段,r,dplyr,text-mining,stringr,text-analysis,R,Dplyr,Text Mining,Stringr,Text Analysis,我的csv中有一个列,其中有一个字段“features”。这些字段具有此格式的数据 {""Air conditioning"",""Elevator"",""Smoke detector""} {""Air conditioning"",""Railing Lights"",""Smoke detector""} {""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""} 它们是20000条记录,在字段“features”中包含
{""Air conditioning"",""Elevator"",""Smoke detector""}
{""Air conditioning"",""Railing Lights"",""Smoke detector""}
{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}
它们是20000条记录,在字段“features”中包含这些字符串,这些字符串没有任何特定顺序
我怎样才能把它们分成不同的列,所有的“空调”都归入第一列,“电梯”归入第二列,依此类推
a b c d
air conditioning elevators smokedetectors
air conditioning elevators smokedetectors washer
air conditioning elevators smokedetectors washer
将
separate
从tidyr
和mutate\u从dplyr
组合起来(插入一个gsub
):
请注意,合并了额外的字段(如第三条记录所示),有关更多选项,请参阅?separate
。如前所述,您可以查看“splitstackshape”包,特别是cSplit\e
函数。使用它,您可以尝试:
library(splitstackshape)
cSplit_e(as.data.table(dfr)[, features := (gsub("[{}\"]", "", features))],
"features", ",", mode = "value", type = "character", drop = TRUE)
## features_Air conditioning features_Dryer features_Elevator features_Railing Lights features_Smoke detector features_Washer
## 1: Air conditioning NA Elevator NA Smoke detector NA
## 2: Air conditioning NA NA Railing Lights Smoke detector NA
## 3: Air conditioning Dryer NA NA Smoke detector Washer
其中,@Remko的回答中定义了“dfr”:
dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}',
'{""Air conditioning"",""Railing Lights"",""Smoke detector""}',
'{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}'))
dfr check?cSplit
来自splitstackshape
软件包。您可以使用read.csv(text=gsub(“[{}]”),“”,txt),header=FALSE,quote=“”)
其中txt
是上面作为一条字符串的文本谢谢。若你们注意到在你们的输出栏中,B列第一个是电梯,第三个是洗衣机。怎样才能把所有的洗衣机放在一根柱子下面,把所有的电梯放在另一根柱子下面。你最初的问题并没有真正表明这一点!我认为我们必须重新考虑解决方案。
library(splitstackshape)
cSplit_e(as.data.table(dfr)[, features := (gsub("[{}\"]", "", features))],
"features", ",", mode = "value", type = "character", drop = TRUE)
## features_Air conditioning features_Dryer features_Elevator features_Railing Lights features_Smoke detector features_Washer
## 1: Air conditioning NA Elevator NA Smoke detector NA
## 2: Air conditioning NA NA Railing Lights Smoke detector NA
## 3: Air conditioning Dryer NA NA Smoke detector Washer
dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}',
'{""Air conditioning"",""Railing Lights"",""Smoke detector""}',
'{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}'))