Python 在数据流上运行自定义模板时出现错误“无法分析文件”_Python_Google Cloud Platform_Google Cloud Dataflow

Python 在数据流上运行自定义模板时出现错误“无法分析文件”

python google-cloud-platform google-cloud-dataflow

Python 在数据流上运行自定义模板时出现错误“无法分析文件”,python,google-cloud-platform,google-cloud-dataflow,Python,Google Cloud Platform,Google Cloud Dataflow,我正在尝试编写一个自定义模板来读取CSV并将其输出到另一个CSV。目标是在此CSV中选择所需的数据。当我在web界面上运行它时，我有以下错误我已经尽可能减少了代码，以理解我的错误，但我仍然没有看到它。我帮助自己查阅了文件：类上载选项管道选项： @类方法 def_add_argparse_argscls，解析器： parser.add_value_provider_参数 “-输入”， default='gs://[MYBUCKET]/input.csv'， help='要从中读取的文件的路径

我正在尝试编写一个自定义模板来读取CSV并将其输出到另一个CSV。目标是在此CSV中选择所需的数据。当我在web界面上运行它时，我有以下错误

我已经尽可能减少了代码，以理解我的错误，但我仍然没有看到它。我帮助自己查阅了文件：

类上载选项管道选项： @类方法 def_add_argparse_argscls，解析器： parser.add_value_provider_参数 “-输入”， default='gs://[MYBUCKET]/input.csv'， help='要从中读取的文件的路径' parser.add_value_provider_参数 “-输出”，必需=真， help='Output file to write results to' pipeline_options=PipelineOptions['-output'，'gs://[MYBUCKET]/output'] p=梁。管道选项=管道选项上传选项=管道选项。查看应用 P |“读取”>>beam.io.Readupload\u options.input |'Write'>>beam.io.WriteToTextupload\u options.output，文件名\u后缀='.csv' 当前错误如下所示

无法分析文件“gs://MYBUCKET/template.py”

在终端中，我有以下错误

错误：gcloud.dataflow.jobs.run失败\u前提条件：无法分析模板文件“gs://[MYBUCKET]/template.py”。 -“@type”：type.googleapis.com/google.rpc.failure 违规行为： -描述：意外的流结束：应为“{” 主题：0:0 类型：JSON

提前感谢

我设法解决了我的问题。问题来自于我在管道读取中使用的变量。自定义\u options变量必须在读取中使用，而不是已知的\u args变量

custom_options = pipeline_options.view_as(CustomPipelineOptions)

我制作了一个通用代码，如果有人需要，我会分享我的解决方案

from __future__ import absolute_import
import argparse

import apache_beam as beam
from apache_beam.io import ReadFromText
from apache_beam.io import WriteToText
from apache_beam.metrics.metric import MetricsFilter
from apache_beam.options.pipeline_options import PipelineOptions, GoogleCloudOptions, SetupOptions

class CustomPipelineOptions(PipelineOptions):
    """
    Runtime Parameters given during template execution
    path and organization parameters are necessary for execution of pipeline
    campaign is optional for committing to bigquery
    """
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument(
            '--path',
            type=str,
            help='Path of the file to read from')
        parser.add_value_provider_argument(
            '--output',
            type=str,
            help='Output file if needed')

def run(argv=None):
    parser = argparse.ArgumentParser()
    known_args, pipeline_args = parser.parse_known_args(argv)

    global cloud_options
    global custom_options

    pipeline_options = PipelineOptions(pipeline_args)
    cloud_options = pipeline_options.view_as(GoogleCloudOptions)
    custom_options = pipeline_options.view_as(CustomPipelineOptions)
    pipeline_options.view_as(SetupOptions).save_main_session = True

    p = beam.Pipeline(options=pipeline_options)

    init_data = (p
                        | 'Hello World' >> beam.Create(['Hello World'])
                        | 'Read Input path' >> beam.Read(custom_options.path)
                 )

    result = p.run()
    # result.wait_until_finish

if __name__ == '__main__':
    run()

然后启动以下命令在GCP上生成模板

python template.py --runner DataflowRunner --project $PROJECT --staging_location gs://$BUCKET/staging --temp_location gs://$BUCKET/temp --
template_location gs://$BUCKET/templates/$TemplateName

你能共享你的模板文件吗？看起来生成的文件有语法错误，所以我们可以查看一下，看看在构建模板时可能出现了什么问题……我在使用github时在github上发布了源代码。