Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud platform 无法查看数据流中beam.combiners.Count.PerElement()的输出_Google Cloud Platform_Google Cloud Dataflow_Apache Beam - Fatal编程技术网

Google cloud platform 无法查看数据流中beam.combiners.Count.PerElement()的输出

Google cloud platform 无法查看数据流中beam.combiners.Count.PerElement()的输出,google-cloud-platform,google-cloud-dataflow,apache-beam,Google Cloud Platform,Google Cloud Dataflow,Apache Beam,我有一个发布男性名字的发布/订阅脚本,如下所示: from google.cloud import pubsub_v1 import names project_id = "Your-Project-Name" topic_name = "Your-Topic-Name" publisher = pubsub_v1.PublisherClient() topic_path = publisher.topic_path(project_id, topic_name) while True:

我有一个发布男性名字的发布/订阅脚本,如下所示:

from google.cloud import pubsub_v1
import names

project_id = "Your-Project-Name"
topic_name = "Your-Topic-Name"

publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, topic_name)

while True:
    data = names.get_first_name(gender='male') #u"Message number {}".format(n)
    data = data.encode("utf-8")
    publisher.publish(topic_path, data=data)
import logging,re,os
import apache_beam as beam
from apache_beam.options.pipeline_options import  PipelineOptions

root = logging.getLogger()
root.setLevel(logging.INFO)

p = beam.Pipeline(options=PipelineOptions())
x = (
 p
 | beam.io.ReadFromPubSub(topic=None, subscription="projects/YOUR-PROJECT-NAME/subscriptions/YOUR-SUBSCRIPTION-NAME").with_output_types(bytes)
 | 'Decode_UTF-8' >> beam.Map(lambda x: x.decode('utf-8'))
 | 'ExtractWords' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
 | 'CountingElem' >> beam.combiners.Count.PerElement()
 | 'FormatOutput' >> beam.MapTuple(lambda word, count: '%s: %s' % (word, count))
 | 'Printing2Log' >> beam.Map(lambda k: logging.info(k)))

result = p.run()
result.wait_until_finish()
然后,我有一个数据流,它从附加到主题的订阅中读取数据,然后对管道的每个元素进行计数,如下所示:

from google.cloud import pubsub_v1
import names

project_id = "Your-Project-Name"
topic_name = "Your-Topic-Name"

publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, topic_name)

while True:
    data = names.get_first_name(gender='male') #u"Message number {}".format(n)
    data = data.encode("utf-8")
    publisher.publish(topic_path, data=data)
import logging,re,os
import apache_beam as beam
from apache_beam.options.pipeline_options import  PipelineOptions

root = logging.getLogger()
root.setLevel(logging.INFO)

p = beam.Pipeline(options=PipelineOptions())
x = (
 p
 | beam.io.ReadFromPubSub(topic=None, subscription="projects/YOUR-PROJECT-NAME/subscriptions/YOUR-SUBSCRIPTION-NAME").with_output_types(bytes)
 | 'Decode_UTF-8' >> beam.Map(lambda x: x.decode('utf-8'))
 | 'ExtractWords' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
 | 'CountingElem' >> beam.combiners.Count.PerElement()
 | 'FormatOutput' >> beam.MapTuple(lambda word, count: '%s: %s' % (word, count))
 | 'Printing2Log' >> beam.Map(lambda k: logging.info(k)))

result = p.run()
result.wait_until_finish()
问题是:我没有从管道的最后3个步骤中获得任何输出,而我可以看到管道的前3个步骤中的数据流,这意味着没有记录任何内容

我期望输出如下:

Peter: 2
Glen: 1
Alex: 1
Ryan: 2

我已经感谢您对我的帮助了

鉴于这是一个流媒体管道,您需要适当地设置窗口/触发以使管道正常工作。见下文。

更具体地说:

警告:梁的默认窗口行为是指定所有图元 将PCollection复制到单个全局窗口并丢弃延迟数据, 即使对于无界的PCollection也是如此。在使用分组变换之前 例如,在无界PCollection上的GroupByKey,必须至少执行以下操作: 以下其中一项:


beam.combiners.Count.PerElement()
中包含一个
GroupByKey

您使用哪个运行程序来运行数据流作业?我使用的是数据流运行程序