分组,然后在R中查找频繁模式
我有以下格式的数据:分组,然后在R中查找频繁模式,r,R,我有以下格式的数据: CIN TRN_TYP 9079954 1 9079954 2 9079954 3 9079954 4 9079954 5 9079954 4 9079954 5 9079954 6 9079954 7 9079954 8 9079954 9 9079954 9 . . . . . . 共有100种类型的CIN(90799541244108715246633,…)和相应的TRN_类型 首先,我希望将这些数据分组为篮子格式:
CIN TRN_TYP
9079954 1
9079954 2
9079954 3
9079954 4
9079954 5
9079954 4
9079954 5
9079954 6
9079954 7
9079954 8
9079954 9
9079954 9
. .
. .
. .
共有100种类型的CIN
(90799541244108715246633,…)和相应的TRN_类型
首先,我希望将这些数据分组为篮子格式:
9079954 1, 2, 3, 4, 5, ....
12441087 19, 14, 21, 3, 7, ...
.
.
.
然后应用arules
包中的eclat
查找频繁模式
请帮助不清楚您想要什么作为输出。有许多选项可以在基本函数中或使用外部软件包(如
plyr
,datatable
)等)聚合结果
这里有一个使用by
功能的选项:
by(tab,tab$CIN,FUN=function(x) unlist(x$TRN_TYP))
tab$CIN: 9079954
[1] 1 2 3 4 5 4 5 6 7 8 9
-----------------------------------------
tab$CIN: 9079955
[1] 11 12 13 14 15 16 17 18 19
编辑
要应用eclat
,首先需要删除重复项
tab <- tab[!duplicated(tab),]
eclat(split(tab$TRN_TYP,tab$CIN)) ## here I am using @Arun solution because
## it seems that it can't coerce by output
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.1 1 10 frequent itemsets FALSE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Warning in eclat(split(tab$TRN_TYP, tab$CIN)) :
You chose a very low absolute support count of 0. You might run out of memory! Increase minimum support.
eclat - find frequent item sets with the eclat algorithm
version 2.6 (2004.08.16) (c) 2002-2004 Christian Borgelt
create itemset ...
set transactions ...[18 item(s), 2 transaction(s)] done [0.00s].
sorting and recoding items ... [18 item(s)] done [0.00s].
creating bit matrix ... [18 row(s), 2 column(s)] done [0.00s].
writing ... [1022 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].
set of 1022 itemsets
tab是从arules软件包获取的eclat