如何在R中将字符数据转换为矩阵格式?
我希望转换以下数据帧。目前看起来是这样的:如何在R中将字符数据转换为矩阵格式?,r,R,我希望转换以下数据帧。目前看起来是这样的: ID Items items.split 1 2729 Bicycle Bicycle 2 3979 TV, Mobile Phone, Bicycle, Water Tank c("TV", "Mobile Ph
ID
Items items.split
1 2729 Bicycle Bicycle
2 3979 TV, Mobile Phone, Bicycle, Water Tank c("TV", "Mobile Phone", "Bicycle", "Water Tank")
3 3860 Mobile Phone, Bicycle, Fan c("Mobile Phone", "Bicycle", "Fan")
4 2357 Mobile Phone, Motorbike c("Mobile Phone", "Motorbike")
5 2278 TV, Mobile Phone, Wagon/Cart, Motorbike, Plow c("TV", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
6 3277 TV, Mobile Phone, Bicycle, Motorbike, Fan c("TV", "Mobile Phone", "Bicycle", "Motorbike", "Fan")
7 3501 Mobile Phone, Bicycle, Water Tank c("Mobile Phone", "Bicycle", "Water Tank")
8 3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow c("Tractor", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
9 3207 DVD Player, Bicycle, Plow c("DVD Player", "Bicycle", "Plow")
10 3928 TV, Mobile Phone, Bicycle, Fan c("TV", "Mobile Phone", "Bicycle", "Fan")
我想将上面的数据帧转换为以下格式:
Bicycle TV Mobile Phone Water Tank [etc...]
2729 1 0 0 0
3979 1 1 1 1
3860 . 1 0 1 0
[etc...]
我不经常使用字符串或字符,因此我在弄清楚如何操作项方面遇到了困难,尤其是split
变量。我看过这样的问题,但我不想要单词的频率计数,而是要将频率计数附加到每个ID。因此我认为我正在努力将类似FreqMat
的东西与一个简单的dplyr
命令集成,该命令将频率命令与每个ID链接起来
非常感谢您的帮助。数据如下所示
structure(list(ID = c(2729L, 3979L, 3860L, 2357L, 2278L, 3277L,
3501L, 3880L, 3207L, 3928L), Items = c("Bicycle", "TV, Mobile Phone, Bicycle, Water Tank",
"Mobile Phone, Bicycle, Fan", "Mobile Phone, Motorbike", "TV, Mobile Phone, Wagon/Cart, Motorbike, Plow",
"TV, Mobile Phone, Bicycle, Motorbike, Fan", "Mobile Phone, Bicycle, Water Tank",
"Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow", "DVD Player, Bicycle, Plow",
"TV, Mobile Phone, Bicycle, Fan"), items.split = list("Bicycle",
c("TV", "Mobile Phone", "Bicycle", "Water Tank"), c("Mobile Phone",
"Bicycle", "Fan"), c("Mobile Phone", "Motorbike"), c("TV",
"Mobile Phone", "Wagon/Cart", "Motorbike", "Plow"), c("TV",
"Mobile Phone", "Bicycle", "Motorbike", "Fan"), c("Mobile Phone",
"Bicycle", "Water Tank"), c("Tractor", "Mobile Phone", "Wagon/Cart",
"Motorbike", "Plow"), c("DVD Player", "Bicycle", "Plow"),
c("TV", "Mobile Phone", "Bicycle", "Fan"))), row.names = c(NA,
10L), class = "data.frame")
在展开
列表
列后,我们可以使用表
library(dplyr)
library(tidyr)
df1 %>%
select(-Items) %>%
unnest(items.split) %>%
table
或者在
base R
中,在堆栈
之后,放入一个两列的data.frame中
table(stack(setNames(df1$items.split, df1$ID))[2:1])
# values
#ind Bicycle DVD Player Fan Mobile Phone Motorbike Plow Tractor TV Wagon/Cart Water Tank
# 2729 1 0 0 0 0 0 0 0 0 0
# 3979 1 0 0 1 0 0 0 1 0 1
# 3860 1 0 1 1 0 0 0 0 0 0
# 2357 0 0 0 1 1 0 0 0 0 0
# 2278 0 0 0 1 1 1 0 1 1 0
# 3277 1 0 1 1 1 0 0 1 0 0
# 3501 1 0 0 1 0 0 0 0 0 1
# 3880 0 0 0 1 1 1 1 0 1 0
# 3207 1 1 0 0 0 1 0 0 0 0
# 3928 1 0 1 1 0 0 0 1 0 0
您可以从
splitstackshape
splitstackshape::cSplit_e(df, "Items", type = "character", fill = 0, drop = TRUE)
# ID items.split Items_Bicycle Items_DVD Player Items_Fan
#1 2729 Bicycle 1 0 0
#2 3979 TV, Mobile Phone, Bicycle, Water Tank 1 0 0
#3 3860 Mobile Phone, Bicycle, Fan 1 0 1
#4 2357 Mobile Phone, Motorbike 0 0 0
#5 2278 TV, Mobile Phone, Wagon/Cart, Motorbike, Plow 0 0 0
#6 3277 TV, Mobile Phone, Bicycle, Motorbike, Fan 1 0 1
#7 3501 Mobile Phone, Bicycle, Water Tank 1 0 0
#8 3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow 0 0 0
#9 3207 DVD Player, Bicycle, Plow 1 1 0
#10 3928 TV, Mobile Phone, Bicycle, Fan 1 0 1
# Items_Mobile Phone Items_Motorbike Items_Plow Items_Tractor Items_TV Items_Wagon/Cart Items_Water Tank
#1 0 0 0 0 0 0 0
#2 1 0 0 0 1 0 1
#3 1 0 0 0 0 0 0
#4 1 1 0 0 0 0 0
#5 1 1 1 0 1 1 0
#6 1 1 0 0 1 0 0
#7 1 0 0 0 0 0 1
#8 1 1 1 1 0 1 0
#9 0 0 1 0 0 0 0
#10 1 0 0 0 1 0 0
抱歉@akrun-我已经完成了这是完美的akrun,谢谢。比我想象的简单多了!当我查看完整的数据集时,我发现这比akrun的答案更准确,因为您的答案仅给出0或1表示存在。非常感谢。