如何在R中将字符数据转换为矩阵格式？_R

如何在R中将字符数据转换为矩阵格式？

如何在R中将字符数据转换为矩阵格式？,r,R,我希望转换以下数据帧。目前看起来是这样的： ID Items items.split 1 2729 Bicycle Bicycle 2 3979 TV, Mobile Phone, Bicycle, Water Tank c("TV", "Mobile Ph

我希望转换以下数据帧。目前看起来是这样的：

 ID
              Items                                       items.split
1   2729    Bicycle                                             Bicycle
2   3979    TV, Mobile Phone, Bicycle, Water Tank               c("TV", "Mobile Phone", "Bicycle", "Water Tank")
3   3860    Mobile Phone, Bicycle, Fan                          c("Mobile Phone", "Bicycle", "Fan")
4   2357    Mobile Phone, Motorbike                             c("Mobile Phone", "Motorbike")
5   2278    TV, Mobile Phone, Wagon/Cart, Motorbike, Plow       c("TV", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
6   3277    TV, Mobile Phone, Bicycle, Motorbike, Fan           c("TV", "Mobile Phone", "Bicycle", "Motorbike", "Fan")
7   3501    Mobile Phone, Bicycle, Water Tank                   c("Mobile Phone", "Bicycle", "Water Tank")
8   3880    Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow  c("Tractor", "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow")
9   3207    DVD Player, Bicycle, Plow                           c("DVD Player", "Bicycle", "Plow")
10  3928    TV, Mobile Phone, Bicycle, Fan                      c("TV", "Mobile Phone", "Bicycle", "Fan")

我想将上面的数据帧转换为以下格式：

       Bicycle    TV      Mobile Phone    Water Tank [etc...]
2729     1         0       0                 0
3979     1         1       1                 1
3860 .   1         0       1                 0
[etc...]

我不经常使用字符串或字符，因此我在弄清楚如何操作

项方面遇到了困难，尤其是split

变量。我看过这样的问题，但我不想要单词的频率计数，而是要将频率计数附加到每个ID。因此我认为我正在努力将类似

FreqMat

的东西与一个简单的

dplyr

命令集成，该命令将频率命令与每个ID链接起来

非常感谢您的帮助。数据如下所示

structure(list(ID = c(2729L, 3979L, 3860L, 2357L, 2278L, 3277L, 
3501L, 3880L, 3207L, 3928L), Items = c("Bicycle", "TV, Mobile Phone, Bicycle, Water Tank", 
"Mobile Phone, Bicycle, Fan", "Mobile Phone, Motorbike", "TV, Mobile Phone, Wagon/Cart, Motorbike, Plow", 
"TV, Mobile Phone, Bicycle, Motorbike, Fan", "Mobile Phone, Bicycle, Water Tank", 
"Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow", "DVD Player, Bicycle, Plow", 
"TV, Mobile Phone, Bicycle, Fan"), items.split = list("Bicycle", 
    c("TV", "Mobile Phone", "Bicycle", "Water Tank"), c("Mobile Phone", 
    "Bicycle", "Fan"), c("Mobile Phone", "Motorbike"), c("TV", 
    "Mobile Phone", "Wagon/Cart", "Motorbike", "Plow"), c("TV", 
    "Mobile Phone", "Bicycle", "Motorbike", "Fan"), c("Mobile Phone", 
    "Bicycle", "Water Tank"), c("Tractor", "Mobile Phone", "Wagon/Cart", 
    "Motorbike", "Plow"), c("DVD Player", "Bicycle", "Plow"), 
    c("TV", "Mobile Phone", "Bicycle", "Fan"))), row.names = c(NA, 
10L), class = "data.frame")

在展开

列表

列后，我们可以使用

表

library(dplyr)
library(tidyr)    
df1 %>% 
   select(-Items) %>%
   unnest(items.split) %>% 
   table

或者在

base R

中，在

堆栈

之后，放入一个两列的data.frame中

table(stack(setNames(df1$items.split, df1$ID))[2:1])
# values
#ind    Bicycle DVD Player Fan Mobile Phone Motorbike Plow Tractor TV Wagon/Cart Water Tank
#  2729       1          0   0            0         0    0       0  0          0          0
#  3979       1          0   0            1         0    0       0  1          0          1
#  3860       1          0   1            1         0    0       0  0          0          0
#  2357       0          0   0            1         1    0       0  0          0          0
#  2278       0          0   0            1         1    1       0  1          1          0
#  3277       1          0   1            1         1    0       0  1          0          0
#  3501       1          0   0            1         0    0       0  0          0          1
#  3880       0          0   0            1         1    1       1  0          1          0
#  3207       1          1   0            0         0    1       0  0          0          0
#  3928       1          0   1            1         0    0       0  1          0          0

您可以从

splitstackshape

splitstackshape::cSplit_e(df, "Items", type = "character", fill = 0, drop = TRUE)


#     ID                                        items.split Items_Bicycle Items_DVD Player Items_Fan
#1  2729                                            Bicycle             1                0         0
#2  3979              TV, Mobile Phone, Bicycle, Water Tank             1                0         0
#3  3860                         Mobile Phone, Bicycle, Fan             1                0         1
#4  2357                            Mobile Phone, Motorbike             0                0         0
#5  2278      TV, Mobile Phone, Wagon/Cart, Motorbike, Plow             0                0         0
#6  3277          TV, Mobile Phone, Bicycle, Motorbike, Fan             1                0         1
#7  3501                  Mobile Phone, Bicycle, Water Tank             1                0         0
#8  3880 Tractor, Mobile Phone, Wagon/Cart, Motorbike, Plow             0                0         0
#9  3207                          DVD Player, Bicycle, Plow             1                1         0
#10 3928                     TV, Mobile Phone, Bicycle, Fan             1                0         1

#   Items_Mobile Phone Items_Motorbike Items_Plow Items_Tractor Items_TV Items_Wagon/Cart Items_Water Tank
#1                   0               0          0             0        0                0                0
#2                   1               0          0             0        1                0                1
#3                   1               0          0             0        0                0                0
#4                   1               1          0             0        0                0                0
#5                   1               1          1             0        1                1                0
#6                   1               1          0             0        1                0                0
#7                   1               0          0             0        0                0                1
#8                   1               1          1             1        0                1                0
#9                   0               0          1             0        0                0                0
#10                  1               0          0             0        1                0                0

抱歉@akrun-我已经完成了这是完美的akrun，谢谢。比我想象的简单多了！当我查看完整的数据集时，我发现这比akrun的答案更准确，因为您的答案仅给出0或1表示存在。非常感谢。