Google bigquery Apache Beam数据流:';非类型';对象没有属性';零件';

Google bigquery Apache Beam数据流:';非类型';对象没有属性';零件';,google-bigquery,google-cloud-platform,google-cloud-dataflow,apache-beam,google-cloud-pubsub,Google Bigquery,Google Cloud Platform,Google Cloud Dataflow,Apache Beam,Google Cloud Pubsub,我正在尝试编写一个管道,从pubsub读取一个流,并使用带有apachebeam的googleclouddataflow将其写入bigquery。 我有以下代码: import apache_beam as beam from apache_beam.transforms.window import FixedWindows topic = 'projects/???/topics/???' table = '???.???' gcs_path = "gs://???" with beam

我正在尝试编写一个管道,从pubsub读取一个流,并使用带有apachebeam的googleclouddataflow将其写入bigquery。 我有以下代码:

import apache_beam as beam
from apache_beam.transforms.window import FixedWindows

topic = 'projects/???/topics/???'
table = '???.???'

gcs_path = "gs://???"

with beam.Pipeline(runner="DataflowRunner", argv=[
        "--project", "???",
        "--staging_location", ("%s/staging_location" % gcs_path),
        "--temp_location", ("%s/temp" % gcs_path),
        "--output", ("%s/output" % gcs_path)
    ]) as p:
    (p 
    | 'winderow' >> beam.WindowInto(FixedWindows(60))
    | 'hello' >> beam.io.gcp.pubsub.ReadStringsFromPubSub(topic) 
    | 'hello2' >> beam.io.Write(beam.io.gcp.bigquery.BigQuerySink(table))
    )
    p.run().wait_until_finish()
但我在运行它时遇到了以下错误:

No handlers could be found for logger "oauth2client.contrib.multistore_file"
ERROR:root:Error while visiting winderow
Traceback (most recent call last):
  File ".\main.py", line 20, in <module>
    p.run().wait_until_finish()
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\pipeline.py", line 339, in run
    return self.runner.run(self)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 296, in run
    super(DataflowRunner, self).run(pipeline)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\runners\runner.py", line 138, in run
    pipeline.visit(RunVisitor(self))
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\pipeline.py", line 367, in visit
    self._root_transform().visit(visitor, self, visited)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\pipeline.py", line 710, in visit
    part.visit(visitor, pipeline, visited)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\pipeline.py", line 713, in visit
    visitor.visit_transform(self)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\runners\runner.py", line 133, in visit_transform
    self.runner.run_transform(transform_node)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\runners\runner.py", line 176, in run_transform
    return m(transform_node)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 526, in run_ParDo
    input_step = self._cache.get_pvalue(transform_node.inputs[0])
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\runners\runner.py", line 252, in get_pvalue
    self._ensure_pvalue_has_real_producer(pvalue)
  File "C:\ProgramData\Anaconda2\lib\site-packages\apache_beam\runners\runner.py", line 226, in _ensure_pvalue_has_real_producer
    while real_producer.parts:
AttributeError: 'NoneType' object has no attribute 'parts'
找不到记录器“oauth2client.contrib.multistore_文件”的处理程序
错误:root:访问winderow时出错
回溯(最近一次呼叫最后一次):
文件“\main.py”,第20行,在
p、 运行()。等待,直到完成()
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\pipeline.py”,第339行,正在运行
返回self.runner.run(self)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\runners\dataflow\dataflow\u runner.py”,第296行,运行中
超级(DataflowRunner,self).运行(管道)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\runners\runner.py”,第138行,运行中
管道访问(RunVisitor(self))
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\pipeline.py”,第367行,访问
self.\u root\u transform().visit(访问者,self,已访问)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\pipeline.py”,第710行,访问
部分参观(访客、管道、参观)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\pipeline.py”,第713行,访问
访客。访客转换(自我)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\runners\runner.py”,第133行,在Visite\u transform中
self.runner.run\u变换(变换节点)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\runners\runner.py”,第176行,在run\u转换中
返回m(变换_节点)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\runners\dataflow\dataflow\u runner.py”,第526行,在run\u ParDo中
input\u step=self.\u cache.get\u pvalue(transform\u node.inputs[0])
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\runners\runner.py”,第252行,在get\u pvalue中
自我。确保价值有真实的生产者(价值)
文件“C:\ProgramData\Anaconda2\lib\site packages\apache\u beam\runners\runner.py”,第226行,在“确保”pvalue“具有真正的生产者”
而real_producer.parts:
AttributeError:“非类型”对象没有属性“部分”
这是代码或配置的问题吗?
如何使其工作?

我还没有使用窗口管道的经验,但根据我从概念上的理解,窗口应该应用于您的输入数据,而不是作为管道设置

在这种情况下,您的代码可能应该是:

with beam.Pipeline(runner="DataflowRunner", argv=[
        "--project", "???",
        "--staging_location", ("%s/staging_location" % gcs_path),
        "--temp_location", ("%s/temp" % gcs_path),
        "--output", ("%s/output" % gcs_path)
    ]) as p:
    (p 
    | 'hello' >> beam.io.gcp.pubsub.ReadStringsFromPubSub(topic) 
    | 'winderow' >> beam.WindowInto(FixedWindows(60))
    | 'hello2' >> beam.io.Write(beam.io.gcp.bigquery.BigQuerySink(table))
    )
    p.run().wait_until_finish()

官方回购协议也有一些窗口操作。

说:
ValueError:PubSubPayloadSource目前仅可用于流媒体管道。
我想知道如果您使用
DirectRunner
而仅仅用于测试,会发生什么情况。它能用吗?这是正确的,先读pubsub。还要添加
--streaming
以使其成为流式管道。PubSub是一个无限源,因此您应该将“-streaming”添加到argv中,并且您不需要p.run()。等待\u直到\u finish()部分,因为无限流永远不会结束