R 如何计算表中的共现率?
我有一个简单的矩阵,例如R 如何计算表中的共现率?,r,R,我有一个简单的矩阵,例如 test <- matrix(c("u1","p1","u1","p2","u2","p2","u2", "p3","u3","p1","u4","p2","u5","p1", "u5","p3","u6","p3","u7","p4","u7", "p3","u8","p1","u9","p4"), ncol=2,byrow=TRU
test <- matrix(c("u1","p1","u1","p2","u2","p2","u2",
"p3","u3","p1","u4","p2","u5","p1",
"u5","p3","u6","p3","u7","p4","u7",
"p3","u8","p1","u9","p4"),
ncol=2,byrow=TRUE)
colnames(test) <- c("user","product")
test1<-as.data.frame(test)
我想统计一下有多少用户一起购买了产品对,比如p1和p2,p2和p3
表格(test1$product,test1$product)
给我这个:
p1 p2 p3 p4
p1 4 0 0 0
p2 0 3 0 0
p3 0 0 4 0
p4 0 0 0 2
如何获得正确的结果,如:
p1 p2 p3 p4
p1 4 1 1 0
p2 1 3 1 0
p3 1 1 4 1
p4 0 0 1 2
查看所需的输出,您正在查找
crossprod
功能:
crossprod(table(test1))
# product
# product p1 p2 p3 p4
# p1 4 1 1 0
# p2 1 3 1 0
# p3 1 1 4 1
# p4 0 0 1 2
这与crossprod(表(test1$user,test1$product))相同。(反映Dennis的评论)。阿南达的解决方案更优越(重量更轻,不需要外部包),但我正在放下另一个。我相信这就是所谓的邻接矩阵(如果我错了,聪明的人可以随意编辑): 此帖子的一个标记请求了一个有效的解决方案,但现在被删除了。我们决定在这里发布解决方案 下面是一个使用
RcppEigen
进行交叉积的函数
library(RcppEigen)
library(inline)
prodFun <- '
typedef Eigen::Map<Eigen::MatrixXi> MapMti;
const MapMti B(as<MapMti>(BB));
const MapMti C(as<MapMti>(CC));
return List::create(B.adjoint() * C);
'
funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
prodFun, plugin = "RcppEigen")
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
# [,1] [,2] [,3] [,4]
#[1,] 4 1 1 0
#[2,] 1 3 1 0
#[3,] 1 1 4 1
#[4,] 0 0 1 2
库(RcppEigen)
库(内联)
prodFun请注意,您目前甚至不使用用户名作为输入。如果u1购买了p1、p2和p3,您希望将1添加到所有(p1、p2)、(p2、p3)和(p3、p1)(加上镜像元素)中?另一种选择是有一个3-d矩阵…是的,将在所有对中添加1
library(qdap)
adjmat(table(test1))$adjacency
## product
## product p1 p2 p3 p4
## p1 4 1 1 0
## p2 1 3 1 0
## p3 1 1 4 1
## p4 0 0 1 2
library(RcppEigen)
library(inline)
prodFun <- '
typedef Eigen::Map<Eigen::MatrixXi> MapMti;
const MapMti B(as<MapMti>(BB));
const MapMti C(as<MapMti>(CC));
return List::create(B.adjoint() * C);
'
funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
prodFun, plugin = "RcppEigen")
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
# [,1] [,2] [,3] [,4]
#[1,] 4 1 1 0
#[2,] 1 3 1 0
#[3,] 1 1 4 1
#[4,] 0 0 1 2
set.seed(24)
test2 <- data.frame(user = sample(1:5000, 1e6, replace=TRUE),
product = sample(paste0("p", 1:50), 1e6, replace = TRUE),
stringsAsFactors=FALSE)
tbl1 <- table(test2)
library(microbenchmark)
microbenchmark(cPP = funCPr(tbl1, tbl1)[[1]],
CrossP = crossprod(tbl1),
adjMat = adjmat(tbl1)$adjacency,
unit = "relative", times = 10L)
#Unit: relative
# expr min lq mean median uq max neval cld
# cPP 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10 a
# CrossP 2.079867 2.070509 2.234376 2.074388 2.290516 2.676798 10 a
# adjMat 6.223034 6.500791 9.619088 7.197824 7.771270 31.394812 10 b