如何创建每行库存的二进制矩阵？（R）_R_Sparse Matrix_Cluster Analysis

如何创建每行库存的二进制矩阵？（R）

如何创建每行库存的二进制矩阵？（R）,r,sparse-matrix,cluster-analysis,R,Sparse Matrix,Cluster Analysis,我有一个由9列组成的数据框架，其中包含一个因素清单。每行可以填充所有9列（如该行中包含9个“东西”），但大多数不填充（大多数在3-4之间）。这些列也不是特定的，因为如果第1列和第3列中显示了项目200，则是相同的。我想为每一行创建一个包含所有因子的二进制矩阵 Ex（缩短为4列，以便于理解要点）应该变成 1 2 3 4 5 6 7 8 9 r1 0 0 1 1 1 0 0 1 0 r2 0 0 0 1 0 1 1 0 0 r3

我有一个由9列组成的数据框架，其中包含一个因素清单。每行可以填充所有9列（如该行中包含9个“东西”），但大多数不填充（大多数在3-4之间）。这些列也不是特定的，因为如果第1列和第3列中显示了项目200，则是相同的。我想为每一行创建一个包含所有因子的二进制矩阵

Ex（缩短为4列，以便于理解要点）

应该变成

     1  2  3  4  5  6  7  8  9 
r1   0  0  1  1  1  0  0  1  0
r2   0  0  0  1  0  1  1  0  0
r3   1  0  0  0  1  0  0  0  0
r4   0  1  0  0  0  1  0  1  1

我研究了writeBin/readBin、K-clustering（这是我想做的事情，但我需要先摆脱NAs）、模糊聚类和标记聚类。只是有点迷失了方向

我曾尝试编写两个for循环，按列/行从矩阵中提取数据，然后将0和1分别保存在新矩阵中，但我认为存在范围问题

你们是最好的。谢谢

这应该可以做到：

# The Incantation
options(stringsAsFactors = FALSE)

library(reshape2)

# Your example data
dat <- data.frame(id = c("R1", "R2", "R3", "R4"),
                  col1 = c(3, 4, 1, 2),
                  col2 = c(4, 6, 5, 6),
                  col3 = c(5, 7, NA, 7),
                  col4 = c(8, NA, NA, 9)
)

# Melt it down
dat.melt <- melt(dat, id.var = "id")

# Cast it back out, with the row IDs remaining the row IDs
# and the values of the columns becoming the columns themselves.
# dcast() will default to length to aggregate records - which means
# that the values in this data.frame are a count of how many times
# each value occurs in each row's columns (which, based on this data,
# seems to be capped at just once).
dat.cast <- dcast(dat.melt, id ~ value)

下面是一个基本的R解决方案：

# Read in the data, and convert to matrix form
df <- read.table(text = "
3  4   5   8
4  6   7   NA
1  5  NA   NA
2  6   8   9", header = FALSE)
m <- as.matrix(df)

# Create a two column matrix containing row/column indices of cells to be filled 
# with 'one's
id <- cbind(rowid = as.vector(t(row(m))), 
            colid = as.vector(t(m)))
id <- id[complete.cases(id), ]

# Create output matrix
out <-  matrix(0, nrow = nrow(m), ncol = max(m, na.rm = TRUE))
out[id] <- 1
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,]    0    0    1    1    1    0    0    1    0
# [2,]    0    0    0    1    0    1    1    0    0
# [3,]    1    0    0    0    1    0    0    0    0
# [4,]    0    1    0    0    0    1    0    1    1

#读取数据，并转换为矩阵形式
这些都是很好的答案。我想我会贡献我写的最初的解决方案，我的一个朋友修改了它，使之真正起作用
for(i in seq(nrow(x)))
  for(j in seq(ncol(x)))
  if(!is.na(x[i,j])) { y[i, x[i,j]] = 1 }

两个for循环在设置了一些早期参数后可以工作，但速度非常慢。看起来这些其他解决方案工作得更快 乔希，这真令人印象深刻。有没有一个术语来形容这一切？我在想库存矩阵、物品矩阵或二进制矩阵，但这些似乎都与其他想法有关。谢谢。我有点认为结果是存在/不存在矩阵的指标矩阵（在后面）（因为它对给定行中的每个项目是否存在进行编码）。不过，不确定是否有一个公认的通用名称。这应该是“指标矩阵……或存在/不存在矩阵”（而不是“OF”）。太晚了，无法编辑注释本身。
# Read in the data, and convert to matrix form
df <- read.table(text = "
3  4   5   8
4  6   7   NA
1  5  NA   NA
2  6   8   9", header = FALSE)
m <- as.matrix(df)

# Create a two column matrix containing row/column indices of cells to be filled 
# with 'one's
id <- cbind(rowid = as.vector(t(row(m))), 
            colid = as.vector(t(m)))
id <- id[complete.cases(id), ]

# Create output matrix
out <-  matrix(0, nrow = nrow(m), ncol = max(m, na.rm = TRUE))
out[id] <- 1
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,]    0    0    1    1    1    0    0    1    0
# [2,]    0    0    0    1    0    1    1    0    0
# [3,]    1    0    0    0    1    0    0    0    0
# [4,]    0    1    0    0    0    1    0    1    1

for(i in seq(nrow(x)))
  for(j in seq(ncol(x)))
  if(!is.na(x[i,j])) { y[i, x[i,j]] = 1 }