Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/ms-access/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud platform Apache Beam:ReadFromText与ReadAllFromText_Google Cloud Platform_Apache Beam_Dataflow - Fatal编程技术网

Google cloud platform Apache Beam:ReadFromText与ReadAllFromText

Google cloud platform Apache Beam:ReadFromText与ReadAllFromText,google-cloud-platform,apache-beam,dataflow,Google Cloud Platform,Apache Beam,Dataflow,我正在运行一个ApacheBeam管道,从Google云存储读取文本文件,对这些文件执行一些解析,并将解析后的数据写入Bigquery 为了简短起见,忽略这里的解析和google_cloud_选项,我的代码如下:(apachebeam 2.5.0,使用GCP插件和Dataflow作为运行程序) 这运行良好,并成功地将相关数据附加到我的Bigquery表中,以获得少量输入文件。但是,当我将输入文件的数量增加到+-800k时,我得到一个错误: “BoundedSource.split()操作返回的B

我正在运行一个ApacheBeam管道,从Google云存储读取文本文件,对这些文件执行一些解析,并将解析后的数据写入Bigquery

为了简短起见,忽略这里的解析和google_cloud_选项,我的代码如下:(apachebeam 2.5.0,使用GCP插件和Dataflow作为运行程序)

这运行良好,并成功地将相关数据附加到我的Bigquery表中,以获得少量输入文件。但是,当我将输入文件的数量增加到+-800k时,我得到一个错误:

“BoundedSource.split()操作返回的BoundedSource对象的总大小大于允许的限制。”

我找到了建议使用ReadAllFromText代替ReadFromText的选项。
但是,当我调出时,会出现以下错误:

回溯(最近一次呼叫最后一次):
文件“/Users/richardtbenade/Repos/de_020/main_isolated.py”,第240行,在
xmltobigquery.run_dataflow()
文件“/Users/richardtbenade/Repos/de_020/main_isolated.py”,第220行,运行数据流中
'parse xml to dict'>>beam.ParDo(XmlToDictFn(),job_spec=self.job_spec)|\
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/transforms/ptransform.py”,第831行,在__
返回self.transform.\uuuuror\uuuuuuu(pValue,self.label)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/transforms/ptransform.py”,第488行,在__
结果=p.apply(self、pvaluelish、label)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第464行,适用于
返回self.apply(transform,pvalueish)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第500行,适用于
pvalueish_result=self.runner.apply(转换,pvalueish)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第187行,在apply中
返回m(转换,输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第193行,在apply_PTransform中
返回transform.expand(输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/io/textio.py”,第470行,展开
返回pvalue |“ReadAllFiles”>>self.\u读取所有文件
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pvalue.py”,第109行,在__
返回self.pipeline.apply(pttransform,self)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第454行,在apply中
标签或变换。标签)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第464行,适用于
返回self.apply(transform,pvalueish)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第500行,适用于
pvalueish_result=self.runner.apply(转换,pvalueish)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第187行,在apply中
返回m(转换,输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第193行,在apply_PTransform中
返回transform.expand(输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/io/filebasedsource.py”,第416行,展开
|'ReadRange'>>ParDo(_ReadRange(self._source_from_file)))
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pvalue.py”,第109行,在__
返回self.pipeline.apply(pttransform,self)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第454行,在apply中
标签或变换。标签)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第464行,适用于
返回self.apply(transform,pvalueish)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第500行,适用于
pvalueish_result=self.runner.apply(转换,pvalueish)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第187行,在apply中
返回m(转换,输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第193行,在apply_PTransform中
返回transform.expand(输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/transforms/util.py”,第568行,展开
|'RemoveAndomkeys'>>映射(lambda t:t[1]))
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pvalue.py”,第109行,在__
返回self.pipeline.apply(pttransform,self)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pipeline.py”,第500行,适用于
pvalueish_result=self.runner.apply(转换,pvalueish)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第187行,在apply中
返回m(转换,输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/runners/runner.py”,第193行,在apply_PTransform中
返回transform.expand(输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/transforms/util.py”,第494行,展开
windowing\u saved=pcoll.windowing
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/pvalue.py”,第130行,在窗口中
自生成输入)
文件“/Users/richardtbenade/virtualenvs/de_020/lib/python2.7/site packages/apache_beam/transforms/ptransform.py”,第443行,在get_窗口中
返回输入[0]。正在打开窗口
文件
p = Pipeline(options=options)

lines = p | 'read from file' >> 
beam.io.ReadFromText('some_gcs_bucket_path*')  |  \
    'parse xml to dict' >> beam.ParDo(
        beam.io.WriteToBigQuery(
            'my_table',
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED)
    p.run()