Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在同一管道中使用ApacheBeamPython作业从BigQuery和文件系统读取数据?_Python_Python 3.x_Apache Beam - Fatal编程技术网

如何在同一管道中使用ApacheBeamPython作业从BigQuery和文件系统读取数据?

如何在同一管道中使用ApacheBeamPython作业从BigQuery和文件系统读取数据?,python,python-3.x,apache-beam,Python,Python 3.x,Apache Beam,我正在尝试使用下面的代码从Bigquery读取一些数据,并从文件系统读取一些数据 但是,当我运行这个管道时,我得到了以下错误 回溯(最近一次调用上次):文件 “/etl/dataflow/etlTXLPreprocessor.py”,第125行,在 run()文件“/etl/dataflow/etlTXLPreprocessor.py”,第120行,在run中 p、 运行()。等待文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_

我正在尝试使用下面的代码从Bigquery读取一些数据,并从文件系统读取一些数据

但是,当我运行这个管道时,我得到了以下错误

回溯(最近一次调用上次):文件 “/etl/dataflow/etlTXLPreprocessor.py”,第125行,在 run()文件“/etl/dataflow/etlTXLPreprocessor.py”,第120行,在run中 p、 运行()。等待文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第461行,运行中 self._options).run(False)File“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第474行,运行中 返回self.runner.runner_管道(self,self._选项)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”, 182号线,在输管道中 返回runner.run_pipeline(pipeline,options)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”, 413号线,在运行管道中 pipeline.replace_all(_get_transform_overrides(options))文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第443行,全部替换 self._replace(override)File“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第340行,in_更换 self.visit(TransformUpdater(self))文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第503行,正在访问 self._root_transform().visit(visitor,self,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第939行,正在访问 part.visit(visitor,pipeline,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第939行,正在访问 part.visit(visitor,pipeline,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第939行,正在访问 part.visit(visitor,pipeline,visted)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第942行,正在访问 visitor.visit_transform(self)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第338行,在visit_transform中 self._替换_(如果需要)(转换_节点)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/pipeline.py”, 第301行,如有需要,替换 新建_输出=替换_转换.expand(输入_节点)文件“/etl/dataflow/venv3/lib/python3.7/site packages/apache_beam/runners/direct/sdf_direct_runner.py”, 第87行,展开 创建调用程序(签名,流程调用=False)文件“apache\u beam/runners/common.py”,第行 360,在apache_beam.runners.common.DoFnInvoker.create_invoker中 TypeError:create_invoker()至少接受2个位置参数(1 (给定)

但是如果我像这样运行我的代码

或者像这样

我无法找出错误。
在Apache beam管道中读取同一数据源是否有限制?

在执行相同类型的操作时,从BigQuery和文件系统中提取数据时,我会遇到相同的错误

lines = p | "Read Input Parameters" >> ReadFromText(options.input)
past_posts = p | "Get Past Posts From BigQuery" >> Read(BigQuerySource(query=f"SELECT url FROM {full_bq_table_id}", use_standard_sql=False))
错误:

回溯(最近一次呼叫最后一次): 文件“/usr/local/cillar/python/3.7.4/Frameworks/python.framework/Versions/3.7/lib/python3.7/runpy.py”,第193行,位于主运行模块中 “main”,模块规格) 文件“/usr/local/ceral/python/3.7.4/Frameworks/python.framework/Versions/3.7/lib/python3.7/runpy.py”,第85行,在运行代码中 exec(代码、运行\全局) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/Craigslist_pipeline.py”,第14行,在 完整的表格id=f“公寓数据项目:{dataset}.craigslist发布” 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/pipeline/init.py”,第35行,运行中 结果=p.运行() 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第461行,运行中 self.\u选项)。运行(False) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第474行,运行中 返回self.runner.run\u管道(self,self.\u选项) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”,第182行,运行管道中 返回流道。运行管道(管道,选项) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/runners/direct/direct_runner.py”,第413行,在run_管道中 pipeline.replace_all(_get_transform_overrides(选项)) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第443行,全部替换 自动更换(超越) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第340行,替换为 self.visit(TransformUpdater(self)) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第503行,访问中 self.\u root\u transform().visit(访问者,self,已访问) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第939行,访问中 部分参观(访客、管道、参观) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第939行,访问中 部分参观(访客、管道、参观) 文件“/Users/ianmitchell/Documents/Personal Projects/Craigslist/env/lib/python3.7/site packages/apache_beam/pipeline.py”,第939行,访问中 部分参观(访客、管道、参观) [上一行重复1次] 文件“/用户/Users/Document”
apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, use_standard_sql=True)) | beam.combiners.ToList()
apn1 = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, use_standard_sql=True)) | beam.combiners.ToList()
preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
preprocess_rows1 = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
lines = p | "Read Input Parameters" >> ReadFromText(options.input)
past_posts = p | "Get Past Posts From BigQuery" >> Read(BigQuerySource(query=f"SELECT url FROM {full_bq_table_id}", use_standard_sql=False))