R 基于事件列创建配对数的数据框
我有一个数据框,其中一列表示事件ID。另一列表示该事件中使用的产品。每个产品仅用于一个事件一次,并且每个事件至少包含一个产品。我想知道每种产品与其他产品一起使用了多少次。以下是一些样本数据:R 基于事件列创建配对数的数据框,r,R,我有一个数据框,其中一列表示事件ID。另一列表示该事件中使用的产品。每个产品仅用于一个事件一次,并且每个事件至少包含一个产品。我想知道每种产品与其他产品一起使用了多少次。以下是一些样本数据: set.seed(1) events <- paste('Event ', sample(1:4, size = 15, replace = TRUE), sep = '') events <- events[order(events)] prods <- paste('Product
set.seed(1)
events <- paste('Event ', sample(1:4, size = 15, replace = TRUE), sep = '')
events <- events[order(events)]
prods <- paste('Product ', c(1, 2, 3, 4, 1, 5, 6, 2, 4, 6, 7, 1, 2, 3, 5))
test_data <- data.frame(events, prods)
test_data
events prods
1 Event 1 Product 1
2 Event 1 Product 2
3 Event 1 Product 3
4 Event 1 Product 4
5 Event 2 Product 1
6 Event 2 Product 5
7 Event 2 Product 6
8 Event 3 Product 2
9 Event 3 Product 4
10 Event 3 Product 6
11 Event 3 Product 7
12 Event 4 Product 1
13 Event 4 Product 2
14 Event 4 Product 3
15 Event 4 Product 5
set.seed(1)
事件也许这是用大锤敲碎坚果,但你可以挖掘(频繁的)项目集,这与其他新奇的东西一起出现。它可以这样工作:
library(arules)
library(reshape2)
mat <- as(sapply(dcast(test_data, events~prods, fun.aggregate = length, value.var="prods")[, -1], as.logical), "transactions")
sets <- apriori(trans, parameter = list(supp = 0, conf = 0, minlen = 2, maxlen = 2, target = "frequent itemsets"))
df <- as(sets, "data.frame")
subset(transform(df, n=support*nrow(trans)), n>0, -support)
# items n
# 2 {Product 6,Product 7} 1
# 4 {Product 4,Product 7} 1
# 6 {Product 2,Product 7} 1
# 7 {Product 5,Product 6} 1
# 8 {Product 3,Product 5} 1
# 10 {Product 1,Product 5} 2
# 11 {Product 2,Product 5} 1
# 13 {Product 4,Product 6} 1
# 14 {Product 1,Product 6} 1
# 15 {Product 2,Product 6} 1
# 16 {Product 3,Product 4} 1
# 17 {Product 1,Product 3} 2
# 18 {Product 2,Product 3} 2
# 19 {Product 1,Product 4} 1
# 20 {Product 2,Product 4} 2
# 21 {Product 1,Product 2} 2
库(阿鲁莱斯)
图书馆(E2)
mat将产品按事件进行拆分,然后计算所有组合,然后聚合
以获得每个组合的计数
out <- t(do.call(cbind,
lapply(split(as.character(test_data$prods), test_data$events), combn, 2))
)
aggregate(count ~ . , data=transform(out,count=1), FUN=sum)
# X1 X2 count
#1 Product 1 Product 2 2
#2 Product 1 Product 3 2
#3 Product 2 Product 3 2
#4 Product 1 Product 4 1
#5 Product 2 Product 4 2
#6 Product 3 Product 4 1
#7 Product 1 Product 5 2
#8 Product 2 Product 5 1
#9 Product 3 Product 5 1
#10 Product 1 Product 6 1
#11 Product 2 Product 6 1
#12 Product 4 Product 6 1
#13 Product 5 Product 6 1
#14 Product 2 Product 7 1
#15 Product 4 Product 7 1
#16 Product 6 Product 7 1
出去,谢谢。起初,我认为我需要“零对”,但我认为如果没有它们,这将非常有效。