R 如何计算表中的共现率?

R 如何计算表中的共现率?,r,R,我有一个简单的矩阵,例如 test <- matrix(c("u1","p1","u1","p2","u2","p2","u2", "p3","u3","p1","u4","p2","u5","p1", "u5","p3","u6","p3","u7","p4","u7", "p3","u8","p1","u9","p4"), ncol=2,byrow=TRU

我有一个简单的矩阵,例如

test <- matrix(c("u1","p1","u1","p2","u2","p2","u2",
                 "p3","u3","p1","u4","p2","u5","p1",
                 "u5","p3","u6","p3","u7","p4","u7",
                 "p3","u8","p1","u9","p4"),
               ncol=2,byrow=TRUE) 
colnames(test) <- c("user","product")
test1<-as.data.frame(test)
我想统计一下有多少用户一起购买了产品对,比如p1和p2,p2和p3

表格(test1$product,test1$product)
给我这个:

     p1   p2  p3  p4
 p1   4   0   0   0
 p2   0   3   0   0
 p3   0   0   4   0
 p4   0   0   0   2
如何获得正确的结果,如:

     p1   p2  p3  p4
 p1   4   1   1   0
 p2   1   3   1   0
 p3   1   1   4   1
 p4   0   0   1   2

查看所需的输出,您正在查找
crossprod
功能:

crossprod(table(test1))
#        product
# product p1 p2 p3 p4
#      p1  4  1  1  0
#      p2  1  3  1  0
#      p3  1  1  4  1
#      p4  0  0  1  2

这与crossprod(表(test1$user,test1$product))相同。(反映Dennis的评论)。

阿南达的解决方案更优越(重量更轻,不需要外部包),但我正在放下另一个。我相信这就是所谓的邻接矩阵(如果我错了,聪明的人可以随意编辑):

此帖子的一个标记请求了一个有效的解决方案,但现在被删除了。我们决定在这里发布解决方案

下面是一个使用
RcppEigen
进行交叉积的函数

library(RcppEigen)
library(inline)
prodFun <- '
        typedef Eigen::Map<Eigen::MatrixXi> MapMti;
        const MapMti B(as<MapMti>(BB));
        const MapMti C(as<MapMti>(CC));
        return List::create(B.adjoint() * C);
        '

funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
                     prodFun, plugin = "RcppEigen") 
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
#     [,1] [,2] [,3] [,4]
#[1,]    4    1    1    0
#[2,]    1    3    1    0
#[3,]    1    1    4    1
#[4,]    0    0    1    2
库(RcppEigen)
库(内联)

prodFun请注意,您目前甚至不使用用户名作为输入。如果u1购买了p1、p2和p3,您希望将1添加到所有(p1、p2)、(p2、p3)和(p3、p1)(加上镜像元素)中?另一种选择是有一个3-d矩阵…是的,将在所有对中添加1
library(qdap)
adjmat(table(test1))$adjacency

##        product
## product p1 p2 p3 p4
##      p1  4  1  1  0
##      p2  1  3  1  0
##      p3  1  1  4  1
##      p4  0  0  1  2
library(RcppEigen)
library(inline)
prodFun <- '
        typedef Eigen::Map<Eigen::MatrixXi> MapMti;
        const MapMti B(as<MapMti>(BB));
        const MapMti C(as<MapMti>(CC));
        return List::create(B.adjoint() * C);
        '

funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
                     prodFun, plugin = "RcppEigen") 
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
#     [,1] [,2] [,3] [,4]
#[1,]    4    1    1    0
#[2,]    1    3    1    0
#[3,]    1    1    4    1
#[4,]    0    0    1    2
set.seed(24)
test2 <- data.frame(user = sample(1:5000, 1e6, replace=TRUE),
    product = sample(paste0("p", 1:50), 1e6, replace = TRUE),
    stringsAsFactors=FALSE)
tbl1 <- table(test2)

library(microbenchmark)
microbenchmark(cPP = funCPr(tbl1, tbl1)[[1]], 
              CrossP = crossprod(tbl1),
              adjMat = adjmat(tbl1)$adjacency,
              unit = "relative", times = 10L)
#Unit: relative
#   expr      min       lq     mean   median       uq       max neval cld
#    cPP 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000    10  a 
# CrossP 2.079867 2.070509 2.234376 2.074388 2.290516  2.676798    10  a 
# adjMat 6.223034 6.500791 9.619088 7.197824 7.771270 31.394812    10   b