Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Apache波束最小值、最大值和平均值_Python_Apache Beam - Fatal编程技术网

Python Apache波束最小值、最大值和平均值

Python Apache波束最小值、最大值和平均值,python,apache-beam,Python,Apache Beam,由此,Guillem Xercavins为compute minimum和maximum编写了一个自定义类 class MinMaxFn(beam.CombineFn): # initialize min and max values (I assumed int type) def create_accumulator(self): return (sys.maxint, 0) # update if current value is a new min or max

由此,Guillem Xercavins为compute minimum和maximum编写了一个自定义类

class MinMaxFn(beam.CombineFn):
  # initialize min and max values (I assumed int type)
  def create_accumulator(self):
    return (sys.maxint, 0)

  # update if current value is a new min or max
  def add_input(self, min_max, input):
    (current_min, current_max) = min_max
    return min(current_min, input), max(current_max, input)

  def merge_accumulators(self, accumulators):
    return accumulators

  def extract_output(self, min_max):
    return min_max
我还需要计算平均值,我发现示例代码如下:

class MeanCombineFn(beam.CombineFn):
  def create_accumulator(self):
    """Create a "local" accumulator to track sum and count."""
    return (0, 0)

  def add_input(self, (sum_, count), input):
    """Process the incoming value."""
    return sum_ + input, count + 1

  def merge_accumulators(self, accumulators):
    """Merge several accumulators into a single one."""
    sums, counts = zip(*accumulators)
    return sum(sums), sum(counts)

  def extract_output(self, (sum_, count)):
    """Compute the mean average."""
    if count == 0:
      return float('NaN')
    return sum_ / float(count)

你知道如何将平均值方法合并为最小值最大值,这样我就只有一个类能够同时计算最小值、最大值和平均值,并生成一组键和值-3个值的数组吗?

这里是组合类解决方案,加上中值

import numpy as np

class MinMaxMeanFn(beam.CombineFn):

    def create_accumulator(self):
        # sum, min, max, count, median
        return (0.0, 999999999.0, 0.0, 0, [])

    def add_input(self, cur_data, input):
        (cur_sum, cur_min, cur_max, count, cur_median) = cur_data
        if type(input) == list:
            cur_count = len(input)
            sum_input = sum(input)
            min_input = min(input)
            max_input = max(input)
        else:
            sum_input = input
            cur_count = 1
        return cur_sum + sum_input, min(min_input, cur_min), max(max_input, cur_max), count + cur_count, cur_median + input

    def merge_accumulators(self, accumulators):
        sums, mins, maxs, counts, medians = zip(*accumulators)
        return sum(sums), min(mins), max(maxs), sum(counts), medians

    def extract_output(self, cur_data):
        (sum, min, max, count, medians) = cur_data
        avg = sum / count if count else float('NaN')
        med = np.median(medians)
        return  {
            "max": max,
            "min": min,
            "avg": avg,
            "count": count,
            "median": med
        }
用法示例:

( input |'Format Price' >> beam.ParDo(FormatPriceDoFn())
                        |'Group Price by ID' >> beam.GroupByKey()
                        |'Compute price statistic for each ID' >> beam.CombinePerKey(MinMaxMeanFn()))

*我没有测试CombinePerKey在没有GroupByKey的情况下是否工作,请随意测试。

您能分享MinMaxMeanFn的示例用法吗?假设我首先使用beam.create([1,2,3,4,5])创建一个管道。如何在此PCollection上调用MinMaxMeanFn方法?hi@xennygrimmato自从发布解决方案以来,我已经更新了代码。对于用法,您基本上有一个pardo来将pcollection格式化为键和值。然后您可以根据您的用例按键或全局组合它们,并在组合函数->beam.CombinePerKey(MinMaxMeanFn())中调用此MinMaxMeanFn