Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/wix/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark MLlib FPGrowth正在运行但不显示频繁项集_Apache Spark_Market Basket Analysis - Fatal编程技术网

Apache spark Spark MLlib FPGrowth正在运行但不显示频繁项集

Apache spark Spark MLlib FPGrowth正在运行但不显示频繁项集,apache-spark,market-basket-analysis,Apache Spark,Market Basket Analysis,我试图用MLlib的FPGrowth对交易数据进行基本的市场篮子分析。我已将交易编码为类似格式: transactions.take(3) res632: Array[Array[String]] = Array(Array(7976503128), Array(68113132893, 1800000725, 3120027015, 4850030414, 2100061223, 5150055538, 60538871457), Array(68113174202)) 其中

我试图用MLlib的FPGrowth对交易数据进行基本的市场篮子分析。我已将交易编码为类似格式:

    transactions.take(3)
    res632: Array[Array[String]] = Array(Array(7976503128), Array(68113132893, 1800000725, 3120027015, 4850030414, 2100061223, 5150055538, 60538871457), Array(68113174202))
其中数组中的单个数字是我的产品id,它们被视为字符串(如68113132893、7976503128等)

现在,当我运行FPGrowth模型时,它运行时没有任何错误:

    val fpg = new FPGrowth()
        .setMinSupport(0.5)
        .setNumPartitions(10)
    val modelBuild = fpg.run(transactions)

    fpg: org.apache.spark.mllib.fpm.FPGrowth = org.apache.spark.mllib.fpm.FPGrowth@74a103be
    modelBuild: org.apache.spark.mllib.fpm.FPGrowthModel[String] = org.apache.spark.mllib.fpm.FPGrowthModel@391b111a
但是当我试图获取频繁项集时,它显示的是空白数组

    modelBuild.freqItemsets.collect().foreach { itemset =>
    println(itemset.freq)
    }

    res660: Array[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[String]] = Array()

无法发现哪里出了问题。请帮忙

将minSupport减小到0.00001,将打印所有集。来自Spark文档:

minSupport:对要标识为频繁的项集的最低支持。例如,如果一个项目显示为5个事务中的3个,则它的支持度为3/5=0.6