Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/image-processing/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何使用Algebird中的HyperlogMonoid执行任意交集和并集_Scala_Scalding - Fatal编程技术网

Scala 如何使用Algebird中的HyperlogMonoid执行任意交集和并集

Scala 如何使用Algebird中的HyperlogMonoid执行任意交集和并集,scala,scalding,Scala,Scalding,我希望将属于特定类别的一组值聚合到HLL数据结构中,以便以后可以执行交集和并集,并计算此类计算的结果基数 我能够使用com.twitter.algebird.hyperlogaggregator估计每个组的基数 我需要帮助使用com.twitter.algebird.HyperLogLogMonoid存储为HLL,然后再利用它计算交点/并集 val lines_parsed = lines.map { line => parseBlueKaiLogEntry(line) } # (uuid, [

我希望将属于特定类别的一组值聚合到HLL数据结构中,以便以后可以执行交集和并集,并计算此类计算的结果基数

我能够使用com.twitter.algebird.hyperlogaggregator估计每个组的基数

我需要帮助使用com.twitter.algebird.HyperLogLogMonoid存储为HLL,然后再利用它计算交点/并集

val lines_parsed = lines.map { line => parseBlueKaiLogEntry(line) } # (uuid, [category id array]) val lines_parsed_flat = lines_parsed.flatMap { case(uuid, category_list) => category_list.toList.map { category_id => (category_id, uuid) } } # (category_id, uuid) # Group by category val lines_parsed_grped = lines_parsed_flat.groupBy { case (cat_id, uuid) => cat_id } # Define HLL aggregator val hll_uniq = HyperLogLogAggregator.sizeAggregator(bits=12).composePrepare[(String, String)]{case(cat_id, uuid) => uuid.toString.getBytes("UTF-8")} # Aggregate using hll count lines_parsed_grped.aggregate(hll_uniq).dump # (category_id, count) - expected output val lines_parsed=lines.map{line=>parseBlueKaiLogEntry(line)} #(uuid,[类别id数组]) val lines_parsed_flat=lines_parsed.flatMap{ 案例(uuid,类别列表)=>category\u list.toList.map{ 类别id=>(类别id,uuid) } } #(类别识别号,uuid) #按类别分组 val lines_parsed_grped=lines_parsed_flat.groupBy{ 案例(类别id,uuid)=>类别id } #定义HLL聚合器 val hll_uniq=HyperLogLogAggregator.sizeAggregator(bits=12).composePrepare[(String,String)]{case(cat_id,uuid)=>uuid.toString.getBytes(“UTF-8”)} #使用hll计数进行聚合 行\u解析\u聚合(hll\u uniq).dump #(类别识别号,计数)-预期输出 现在,我尝试使用HLL幺半群

# I now want to store as HLL and this is where I'm not sure what to do # Create HLL Monoid val hll = new HyperLogLogMonoid(bits = 12) val lines_grped_hll = lines_parsed_grped.mapValues { case(cat_id:String, uuid:String ) => uuid}.values.map {v:String => hll.create(v.getBytes("UTF-8"))} # Calling dump results in a lot more lines that I expect to see lines_grped_hll.dump #我现在想存储为HLL,这是我不知道该怎么做的地方 #创建HLL幺半群 val hll=新的HyperlogMonoid(位=12) val lines\u grped\u hll=lines\u parsed\u grped.mapValues{case(cat\u id:String,uuid:String)=>uuid}.values.map{v:String=>hll.create(v.getBytes(“UTF-8”)} #调用dump会产生更多我希望看到的行 排土场 我在这里干什么?

使用:

val result  =  hll.sum(lines_grped_hll) //or suitable method of hll for you

result.dump
使用:


你期望得到什么样的结果?按ID分组的猫总数?您希望得到什么样的结果?按ID分组的猫的总数?