R 如何计算表中的共现率？_R

R 如何计算表中的共现率？

R 如何计算表中的共现率？,r,R,我有一个简单的矩阵，例如 test <- matrix(c("u1","p1","u1","p2","u2","p2","u2", "p3","u3","p1","u4","p2","u5","p1", "u5","p3","u6","p3","u7","p4","u7", "p3","u8","p1","u9","p4"), ncol=2,byrow=TRU

我有一个简单的矩阵，例如

test <- matrix(c("u1","p1","u1","p2","u2","p2","u2",
                 "p3","u3","p1","u4","p2","u5","p1",
                 "u5","p3","u6","p3","u7","p4","u7",
                 "p3","u8","p1","u9","p4"),
               ncol=2,byrow=TRUE) 
colnames(test) <- c("user","product")
test1<-as.data.frame(test)

我想统计一下有多少用户一起购买了产品对，比如p1和p2，p2和p3

表格（test1$product，test1$product）

给我这个：

     p1   p2  p3  p4
 p1   4   0   0   0
 p2   0   3   0   0
 p3   0   0   4   0
 p4   0   0   0   2

如何获得正确的结果，如：

     p1   p2  p3  p4
 p1   4   1   1   0
 p2   1   3   1   0
 p3   1   1   4   1
 p4   0   0   1   2

查看所需的输出，您正在查找

crossprod

功能：

crossprod(table(test1))
#        product
# product p1 p2 p3 p4
#      p1  4  1  1  0
#      p2  1  3  1  0
#      p3  1  1  4  1
#      p4  0  0  1  2

这与crossprod（表（test1$user，test1$product））相同。（反映Dennis的评论）。

阿南达的解决方案更优越（重量更轻，不需要外部包），但我正在放下另一个。我相信这就是所谓的邻接矩阵（如果我错了，聪明的人可以随意编辑）：

此帖子的一个标记请求了一个有效的解决方案，但现在被删除了。我们决定在这里发布解决方案

下面是一个使用

RcppEigen

进行交叉积的函数

library(RcppEigen)
library(inline)
prodFun <- '
        typedef Eigen::Map<Eigen::MatrixXi> MapMti;
        const MapMti B(as<MapMti>(BB));
        const MapMti C(as<MapMti>(CC));
        return List::create(B.adjoint() * C);
        '

funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
                     prodFun, plugin = "RcppEigen") 
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
#     [,1] [,2] [,3] [,4]
#[1,]    4    1    1    0
#[2,]    1    3    1    0
#[3,]    1    1    4    1
#[4,]    0    0    1    2

库（RcppEigen）
库（内联）
prodFun请注意，您目前甚至不使用用户名作为输入。如果u1购买了p1、p2和p3，您希望将1添加到所有（p1、p2）、（p2、p3）和（p3、p1）（加上镜像元素）中？另一种选择是有一个3-d矩阵…是的，将在所有对中添加1
library(qdap)
adjmat(table(test1))$adjacency

##        product
## product p1 p2 p3 p4
##      p1  4  1  1  0
##      p2  1  3  1  0
##      p3  1  1  4  1
##      p4  0  0  1  2

library(RcppEigen)
library(inline)
prodFun <- '
        typedef Eigen::Map<Eigen::MatrixXi> MapMti;
        const MapMti B(as<MapMti>(BB));
        const MapMti C(as<MapMti>(CC));
        return List::create(B.adjoint() * C);
        '

funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
                     prodFun, plugin = "RcppEigen") 
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
#     [,1] [,2] [,3] [,4]
#[1,]    4    1    1    0
#[2,]    1    3    1    0
#[3,]    1    1    4    1
#[4,]    0    0    1    2

set.seed(24)
test2 <- data.frame(user = sample(1:5000, 1e6, replace=TRUE),
    product = sample(paste0("p", 1:50), 1e6, replace = TRUE),
    stringsAsFactors=FALSE)
tbl1 <- table(test2)

library(microbenchmark)
microbenchmark(cPP = funCPr(tbl1, tbl1)[[1]], 
              CrossP = crossprod(tbl1),
              adjMat = adjmat(tbl1)$adjacency,
              unit = "relative", times = 10L)
#Unit: relative
#   expr      min       lq     mean   median       uq       max neval cld
#    cPP 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000    10  a 
# CrossP 2.079867 2.070509 2.234376 2.074388 2.290516  2.676798    10  a 
# adjMat 6.223034 6.500791 9.619088 7.197824 7.771270 31.394812    10   b