R 同一数据源多切分的关联规则挖掘
目标:在每个报告中,为每个部门列出前5个关联规则(按置信度) 我的现有语法和测试数据:R 同一数据源多切分的关联规则挖掘,r,lapply,apriori,arules,R,Lapply,Apriori,Arules,目标:在每个报告中,为每个部门列出前5个关联规则(按置信度) 我的现有语法和测试数据: # Create fake data; 1= used report, 0 = didn't use report data <- data.frame(Dept=c('A','A','A','B','B','B'), Rep1=c(1,1,1,1,1,1), Rep2=c(0,0,0,1,1,1), Rep3=c
# Create fake data; 1= used report, 0 = didn't use report
data <- data.frame(Dept=c('A','A','A','B','B','B'),
Rep1=c(1,1,1,1,1,1),
Rep2=c(0,0,0,1,1,1),
Rep3=c(1,1,1,0,0,0),
Rep4=c(0,1,0,1,1,0),
Rep5=c(0,0,0,0,0,0),
Rep6=c(1,1,0,0,1,0),
Rep7=c(1,1,1,1,1,0),
Rep8=c(0,0,0,1,1,0),
Rep9=c(1,0,0,1,1,0),
Rep10=c(1,1,0,0,1,1)
)
# Turn all variables to factors
data<-data.frame(lapply(data, factor))
# Changes 0s to NAs, only interested in rules where the report was used
data[data==0]<-NA
# lapply command to run apriori on the data when split by Dept
rules <- lapply(split(data, list(data$Dept)), function(x) {
# Turn split data into transactions
temp <- as(x[ , 2:length(x)], "transactions")
# Create rules; artificially low parameters for testing
temp <- apriori(temp, parameter = list(support=0.01, confidence=0.1, minlen=2, maxlen=2))
# Order rules by confidence, eventually will select top 5 (I'm able to do that), and change it to a data frame for later use
temp <- as(sort(temp, by = "confidence")[0:length(temp)], "data.frame")
})
# Breaks out the results into separate data.frames
list2env(rules,.GlobalEnv)
理想情况下,我的data.frames应该沿着
仅报告9数据框的A部门
仅报告4数据框的部门A
您需要查看规则模板,将规则的左侧(LHS)限制为某个部门。请看一下
中的示例?阿鲁勒斯的外观
将LHS限制在某个部门的代码如下所示:
rules\u Rep9我正在考虑使用子集,它确实提供了我想要的额外中断。理想情况下,我希望语法能够产生所有可能的中断,而不是一次只运行一个中断(IRL,有更多的部门和报告)。有什么想法吗?您可以轻松地运行上面的代码或在for循环中使用子集,如for(c中的dept(“a”、“B”等){select subset and mine rules}
。
rules support confidence lift
{Rep9=1}=>{Rep6=1} .3333333 1.00000000 1.5
{Rep4=1}=>{Rep6=1} .3333333 1.00000000 1.5
...
rules support confidence lift
{Rep9=1}=>{Rep6=1} .3333333 1.00000000 1.5
{Rep9=1}=>{Rep10=1} .3333333 1.00000000 1.5
...
rules support confidence lift
{Rep4=1}=>{Rep6=1} .3333333 1.00000000 1.5
{Rep4=1}=>{Rep10=1} .3333333 1.00000000 1.5
...