R：将成对数据转换为R中的邻接数据集_R

R：将成对数据转换为R中的邻接数据集

R：将成对数据转换为R中的邻接数据集,r,R,假设我有以下数据集： set.seed(42) test <- data.frame(event_id = stringi::stri_rand_strings(1000, 2, '[A-Z]'), person_id = floor(runif(1000, min=0, max=500))) >head(test) event_id person_id 1 EP 438 2 IX 227 3 AV 212

假设我有以下数据集：

set.seed(42)
test <- data.frame(event_id = stringi::stri_rand_strings(1000, 2, '[A-Z]'), person_id = floor(runif(1000, min=0, max=500)))

>head(test)
  event_id person_id
1       EP       438
2       IX       227
3       AV       212
4       GX       469
5       QF       193
6       MM       222

做这件事最有效的方法是什么？注：数据集包含200万个观察值

我尝试了评论部分建议的技术，在我的实际数据集上出现以下错误：

adjacency_df <- crossprod(table(test)
Error in table(adjacency_df) : 
  attempt to make a table with >= 2^31 elements

因此，我需要一种更好的方法

因为矩阵大小似乎是个问题，您可以使用crossprod的矩阵版本来实现这一点，如下所示：

library(Matrix)

mat <- with(
  test,
  sparseMatrix(
    i = as.numeric(factor(event_id)),
    j = as.numeric(factor(person_id)),
    dimnames = list(levels(factor(event_id)), levels(factor(person_id)))
  )
)

crossprod(mat)

矩阵包创建稀疏矩阵，因此它应该能够处理更多的单元格。

不确定这是否能解决crossprod的错误，但可以这样尝试。上述数据：

library(dplyr)

 set.seed(42)
  test <-
    data.frame(
      event_id = stringi::stri_rand_strings(1000, 2, '[A-Z]'),
      person_id = floor(runif(1000, min = 0, max = 500))
    )

将该分组输出用作crossprod的输入：

这与您期望的输出接近吗？很难判断它是否起作用-也许可以看看这个较小的示例数据集：

{
  set.seed(42)
  test <-
    data.frame(
      event_id = sample(c("AB", "LM", "YZ"), size = 10, replace = TRUE),
      person_id = 1:10
    )
  out <- test %>%
    group_by(event_id) %>%
    table() 
  x <- crossprod(out)
  print(out)
  x
}

        person_id
event_id 1 2 3 4 5 6 7 8 9 10
      AB 0 0 1 0 0 0 0 1 0  0
      LM 0 0 0 0 1 1 0 0 1  0
      YZ 1 1 0 1 0 0 1 0 0  1
         person_id
person_id 1 2 3 4 5 6 7 8 9 10
       1  1 1 0 1 0 0 1 0 0  1
       2  1 1 0 1 0 0 1 0 0  1
       3  0 0 1 0 0 0 0 1 0  0
       4  1 1 0 1 0 0 1 0 0  1
       5  0 0 0 0 1 1 0 0 1  0
       6  0 0 0 0 1 1 0 0 1  0
       7  1 1 0 1 0 0 1 0 0  1
       8  0 0 1 0 0 0 0 1 0  0
       9  0 0 0 0 1 1 0 0 1  0
       10 1 1 0 1 0 0 1 0 0  1

看看这个问题：。A5C1D2H2I1M1N2O1R2T1的回答提到了交叉产品表D请参见编辑。我尝试过crossprod方法，但没有效果。igraph库能满足您的需要吗？也就是说，类似于LibraryGraph的东西；g谢谢你的主意。不幸的是，我得到了相同的内存错误：>out%+选择'meeting\u id'，'investee\u id'>%+按'meeting\u id'>%+分组'+表中的表错误：尝试创建>=2^31的表elements@Parseltongue-很公平，真的不足为奇！希望杰克的回答能让你完全明白。不过，非常感谢你抽出时间！非常感谢你。我已经在这个问题上工作了一周-这和你提交的评论都是我的工作。igraph解决方案很好，因为我可以方便地从中生成edgelist。有没有从稀疏矩阵生成边列表的简单方法？很高兴这很有用。看起来summarymat会这样做：你会如何将生成的邻接矩阵转换成一个可以在igraph库中解释的图形对象吗？我已经试了一个小时了，但似乎不知道如何在通过这个生成的邻接矩阵中读取它@满意的fisher@Parseltongue要做到这一点，我将坚持我在评论你的问题时提到的方法。在那里的代码中，g是igraph图形对象。如果你想要二部投影，你可以做一些类似bp的事情，非常感谢！

library(Matrix)

mat <- with(
  test,
  sparseMatrix(
    i = as.numeric(factor(event_id)),
    j = as.numeric(factor(person_id)),
    dimnames = list(levels(factor(event_id)), levels(factor(person_id)))
  )
)

crossprod(mat)

library(dplyr)

 set.seed(42)
  test <-
    data.frame(
      event_id = stringi::stri_rand_strings(1000, 2, '[A-Z]'),
      person_id = floor(runif(1000, min = 0, max = 500))
    )

out <- test %>%
  group_by(event_id) %>%
  table()

x <- crossprod(out)

> x[1:20, 1:20]
         person_id
person_id 0 2 3 4 5 6 9 10 11 12 13 14 15 16 17 18 19 20 21 23
       0  1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
       2  0 5 0 0 0 0 0  0  0  0  0  0  0  1  0  0  0  0  0  0
       3  0 0 4 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
       4  0 0 0 3 0 0 0  0  0  0  1  0  0  0  0  0  0  0  0  0
       5  0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
       6  0 0 0 0 0 1 0  0  0  0  0  0  0  0  0  0  0  0  0  0
       9  0 0 0 0 0 0 3  0  0  0  0  0  0  0  0  0  0  0  0  0
       10 0 0 0 0 0 0 0  4  0  0  0  0  0  0  0  0  0  0  0  0
       11 0 0 0 0 0 0 0  0  1  0  0  0  0  0  0  0  0  0  0  0
       12 0 0 0 0 0 0 0  0  0  2  0  0  0  0  0  0  0  0  0  0
       13 0 0 0 1 0 0 0  0  0  0  2  0  0  0  0  0  0  0  0  0
       14 0 0 0 0 0 0 0  0  0  0  0  3  0  0  0  0  0  0  0  0
       15 0 0 0 0 0 0 0  0  0  0  0  0  1  0  0  0  0  0  0  0
       16 0 1 0 0 0 0 0  0  0  0  0  0  0  3  0  0  0  0  0  0
       17 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0  0  0
       18 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  5  0  0  0  0
       19 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  3  0  0  0
       20 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  3  0  0
       21 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  2  0
       23 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  3

{
  set.seed(42)
  test <-
    data.frame(
      event_id = sample(c("AB", "LM", "YZ"), size = 10, replace = TRUE),
      person_id = 1:10
    )
  out <- test %>%
    group_by(event_id) %>%
    table() 
  x <- crossprod(out)
  print(out)
  x
}

        person_id
event_id 1 2 3 4 5 6 7 8 9 10
      AB 0 0 1 0 0 0 0 1 0  0
      LM 0 0 0 0 1 1 0 0 1  0
      YZ 1 1 0 1 0 0 1 0 0  1
         person_id
person_id 1 2 3 4 5 6 7 8 9 10
       1  1 1 0 1 0 0 1 0 0  1
       2  1 1 0 1 0 0 1 0 0  1
       3  0 0 1 0 0 0 0 1 0  0
       4  1 1 0 1 0 0 1 0 0  1
       5  0 0 0 0 1 1 0 0 1  0
       6  0 0 0 0 1 1 0 0 1  0
       7  1 1 0 1 0 0 1 0 0  1
       8  0 0 1 0 0 0 0 1 0  0
       9  0 0 0 0 1 1 0 0 1  0
       10 1 1 0 1 0 0 1 0 0  1