如何在同一管道中使用ApacheBeamPython作业从BigQuery和文件系统读取数据?
我正在尝试使用下面的代码从Bigquery读取一些数据,并从文件系统读取一些数据 但是,当我运行这个管道时,我得到了以下错误 回溯(最近一次调用上次):文件 “/etl/dataflow/etlTXLPreprocessor.py”,第125行,在 run()文件“/etl/dataflow/etlTXLPreprocessor.py”,第120行,在run中 p、 运行()。等待文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第461行,运行中 self._options).run(False)File“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第474行,运行中 返回self.runner.runner_管道(self,self._选项)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”, 182号线,在输管道中 返回runner.run_pipeline(pipeline,options)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”, 413号线,在运行管道中 pipeline.replace_all(_get_transform_overrides(options))文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第443行,全部替换 self._replace(override)File“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第340行,in_更换 self.visit(TransformUpdater(self))文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第503行,正在访问 self._root_transform().visit(visitor,self,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第939行,正在访问 part.visit(visitor,pipeline,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第939行,正在访问 part.visit(visitor,pipeline,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第939行,正在访问 part.visit(visitor,pipeline,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第942行,正在访问 visitor.visit_transform(self)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第338行,在visit_transform中 self._替换_(如果需要)(转换_节点)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第301行,如有需要,替换 新建_输出=替换_转换.expand(输入_节点)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/runners/direct/sdf_direct_runner.py”, 第87行,展开 创建调用程序(签名,流程调用=False)文件“apache\u beam/runners/common.py”,第行 360,在apache_beam.runners.common.DoFnInvoker.create_invoker中 TypeError:create_invoker()至少接受2个位置参数(1 (给定) 但是如果我像这样运行我的代码 或者像这样 我无法找出错误。如何在同一管道中使用ApacheBeamPython作业从BigQuery和文件系统读取数据?,python,python-3.x,apache-beam,Python,Python 3.x,Apache Beam,我正在尝试使用下面的代码从Bigquery读取一些数据,并从文件系统读取一些数据 但是,当我运行这个管道时,我得到了以下错误 回溯(最近一次调用上次):文件 “/etl/dataflow/etlTXLPreprocessor.py”,第125行,在 run()文件“/etl/dataflow/etlTXLPreprocessor.py”,第120行,在run中 p、 运行()。等待文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_
在Apache beam管道中读取同一数据源是否有限制?在执行相同类型的操作时,从BigQuery和文件系统中提取数据时,我会遇到相同的错误
lines = p | "Read Input Parameters" >> ReadFromText(options.input)
past_posts = p | "Get Past Posts From BigQuery" >> Read(BigQuerySource(query=f"SELECT url FROM {full_bq_table_id}", use_standard_sql=False))
错误:
回溯(最近一次呼叫最后一次):
文件“/usr/local/cillar/python/3.7.4/Frameworks/python.framework/Versions/3.7/lib/python3.7/runpy.py”,第193行,位于主运行模块中
“main”,模块规格)
文件“/usr/local/ceral/python/3.7.4/Frameworks/python.framework/Versions/3.7/lib/python3.7/runpy.py”,第85行,在运行代码中
exec(代码、运行\全局)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/Craigslist_pipeline.py”,第14行,在
完整的表格id=f“公寓数据项目:{dataset}.craigslist发布”
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/pipeline/init.py”,第35行,运行中
结果=p.运行()
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第461行,运行中
self.\u选项)。运行(False)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第474行,运行中
返回self.runner.run\u管道(self,self.\u选项)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”,第182行,运行管道中
返回流道。运行管道(管道,选项)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”,第413行,在run_管道中
pipeline.replace_all(_get_transform_overrides(选项))
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第443行,全部替换
自动更换(超越)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第340行,替换为
self.visit(TransformUpdater(self))
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第503行,访问中
self.\u root\u transform().visit(访问者,self,已访问)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第939行,访问中
部分参观(访客、管道、参观)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第939行,访问中
部分参观(访客、管道、参观)
文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第939行,访问中
部分参观(访客、管道、参观)
[上一行重复1次]
文件“/用户/Users/Document”
apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, use_standard_sql=True)) | beam.combiners.ToList()
apn1 = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, use_standard_sql=True)) | beam.combiners.ToList()
preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
preprocess_rows1 = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
lines = p | "Read Input Parameters" >> ReadFromText(options.input)
past_posts = p | "Get Past Posts From BigQuery" >> Read(BigQuerySource(query=f"SELECT url FROM {full_bq_table_id}", use_standard_sql=False))