R 基于事件列创建配对数的数据框

R 基于事件列创建配对数的数据框,r,R,我有一个数据框,其中一列表示事件ID。另一列表示该事件中使用的产品。每个产品仅用于一个事件一次,并且每个事件至少包含一个产品。我想知道每种产品与其他产品一起使用了多少次。以下是一些样本数据: set.seed(1) events <- paste('Event ', sample(1:4, size = 15, replace = TRUE), sep = '') events <- events[order(events)] prods <- paste('Product

我有一个数据框,其中一列表示事件ID。另一列表示该事件中使用的产品。每个产品仅用于一个事件一次,并且每个事件至少包含一个产品。我想知道每种产品与其他产品一起使用了多少次。以下是一些样本数据:

set.seed(1)
events <- paste('Event ', sample(1:4, size = 15, replace = TRUE), sep = '')
events <- events[order(events)]

prods <- paste('Product ', c(1, 2, 3, 4, 1, 5, 6, 2, 4, 6, 7, 1, 2, 3, 5))

test_data <- data.frame(events, prods)
test_data
  events      prods
1  Event 1 Product  1
2  Event 1 Product  2
3  Event 1 Product  3
4  Event 1 Product  4
5  Event 2 Product  1
6  Event 2 Product  5
7  Event 2 Product  6
8  Event 3 Product  2
9  Event 3 Product  4
10 Event 3 Product  6
11 Event 3 Product  7
12 Event 4 Product  1
13 Event 4 Product  2
14 Event 4 Product  3
15 Event 4 Product  5
set.seed(1)

事件也许这是用大锤敲碎坚果,但你可以挖掘(频繁的)项目集,这与其他新奇的东西一起出现。它可以这样工作:

library(arules)
library(reshape2)
mat <- as(sapply(dcast(test_data, events~prods, fun.aggregate = length, value.var="prods")[, -1], as.logical), "transactions")
sets <- apriori(trans, parameter = list(supp = 0, conf = 0, minlen = 2, maxlen = 2, target = "frequent itemsets"))
df <- as(sets, "data.frame")
subset(transform(df, n=support*nrow(trans)), n>0, -support)
#                      items n
# 2  {Product  6,Product  7} 1
# 4  {Product  4,Product  7} 1
# 6  {Product  2,Product  7} 1
# 7  {Product  5,Product  6} 1
# 8  {Product  3,Product  5} 1
# 10 {Product  1,Product  5} 2
# 11 {Product  2,Product  5} 1
# 13 {Product  4,Product  6} 1
# 14 {Product  1,Product  6} 1
# 15 {Product  2,Product  6} 1
# 16 {Product  3,Product  4} 1
# 17 {Product  1,Product  3} 2
# 18 {Product  2,Product  3} 2
# 19 {Product  1,Product  4} 1
# 20 {Product  2,Product  4} 2
# 21 {Product  1,Product  2} 2
库(阿鲁莱斯)
图书馆(E2)

mat将
产品按
事件进行拆分,然后计算所有
组合,然后
聚合
以获得每个组合的计数

out <- t(do.call(cbind,
  lapply(split(as.character(test_data$prods), test_data$events), combn, 2))
)
aggregate(count ~ . , data=transform(out,count=1), FUN=sum)

#           X1         X2 count
#1  Product  1 Product  2     2
#2  Product  1 Product  3     2
#3  Product  2 Product  3     2
#4  Product  1 Product  4     1
#5  Product  2 Product  4     2
#6  Product  3 Product  4     1
#7  Product  1 Product  5     2
#8  Product  2 Product  5     1
#9  Product  3 Product  5     1
#10 Product  1 Product  6     1
#11 Product  2 Product  6     1
#12 Product  4 Product  6     1
#13 Product  5 Product  6     1
#14 Product  2 Product  7     1
#15 Product  4 Product  7     1
#16 Product  6 Product  7     1

出去,谢谢。起初,我认为我需要“零对”,但我认为如果没有它们,这将非常有效。