在Google Cloud数据流中使用python创建自定义模板时出现属性错误_Python_Templates_Google Cloud Platform_Google Cloud Dataflow

在Google Cloud数据流中使用python创建自定义模板时出现属性错误

python templates google-cloud-platform google-cloud-dataflow

在Google Cloud数据流中使用python创建自定义模板时出现属性错误,python,templates,google-cloud-platform,google-cloud-dataflow,Python,Templates,Google Cloud Platform,Google Cloud Dataflow,我在为云数据流创建自定义模板时遇到了一个问题。它的简单代码从输入bucket中获取数据并加载到BigQuery中。我们希望加载许多表，以便尝试创建自定义模板。一旦这样做了，下一步就是将dataset也作为参数传递错误消息： AttributeError:“StaticValueProvider”对象没有属性“datasetId” 代码 class ContactUploadOptions(PipelineOptions): """ Runtime Parameters g

我在为云数据流创建自定义模板时遇到了一个问题。它的简单代码从输入bucket中获取数据并加载到BigQuery中。我们希望加载许多表，以便尝试创建自定义模板。一旦这样做了，下一步就是将dataset也作为参数传递

错误消息：

AttributeError:“StaticValueProvider”对象没有属性“datasetId”

代码

class ContactUploadOptions(PipelineOptions):
    """
    Runtime Parameters given during template execution
    path and organization parameters are necessary for execution of pipeline
    campaign is optional for committing to bigquery
    """

    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument(
            '--input',
            type=str,
            help='Path of the file to read from'
            )
        parser.add_value_provider_argument(
            '--output',
            type=str,
            help='Output BQ table for the pipeline')


def run(argv=None):
    """The main function which creates the pipeline and runs it."""

    global PROJECT
    from google.cloud import bigquery


    # Retrieve project Id and append to PROJECT form GoogleCloudOptions

    # Initialize runtime parameters as object
    contact_options = PipelineOptions().view_as(ContactUploadOptions)
    PROJECT = PipelineOptions().view_as(GoogleCloudOptions).project
    client = bigquery.Client(project=PROJECT)
    dataset = client.dataset('pharma')    
    data_ingestion = DataIngestion()
    pipeline_options = PipelineOptions()
    # Save main session state so pickled functions and classes
    # defined in __main__ can be unpickled
    pipeline_options.view_as(SetupOptions).save_main_session = True
    # Parse arguments from command line.
    #data_ingestion = DataIngestion()

    # Instantiate pipeline
    options = PipelineOptions()
    p = beam.Pipeline(options=options)
    (p
     | 'Read from a File' >> beam.io.ReadFromText(contact_options.input, skip_header_lines=0)
     | 'String To BigQuery Row' >> beam.Map(lambda s: data_ingestion.parse_method(s))
     | 'Write to BigQuery' >> beam.io.Write(
                beam.io.BigQuerySink(
                    contact_options.output,
                    schema='assetid:INTEGER,assetname:STRING,prodcd:INTEGER',
                    create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
                    write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))
     )

我的命令如下：

python3 -m pharm_template --runner DataflowRunner  --project jupiter-120  --staging_location gs://input-cdc/temp/staging  --temp_location gs://input-cdc/temp/   --template_location gs://code-cdc/temp/templates/jupiter_pipeline_template

我的尝试：

我试着传递

--input

和

--output

我还尝试了

--实验=使用光束/bq\u接收器

，但没有效果。我还尝试传递datasetID

datasetId = StaticValueProvider(str, 'pharma')

但是没有运气。

如果有人创建了在BQ中加载的模板，那么我可以接受提示并修复此问题。

您好，我正在调查您的问题。您是否使用此模板[1]使用ValueProvider实例化管道？链接：[1]您好，我只是想知道您是否已经解决了这个问题。另外，如果您可以使用directrunner告诉我错误仍然存在。是的，我可以通过beam.io.BigQueryLink（（“%s:%s.%s”%（项目、数据集、表））中的这种方式绕过此问题，…正致力于使其通用化，因为它仍然需要运行时值提供程序的默认值。这太好了！目前您没有任何其他问题，对吗？我当前的问题是-在创建模板时，它仍然需要运行时提供程序值，然后该值将成为模板文件的一部分。如果是这样，则模板将不起作用作为真正意义上的模板，我怎么能绕过它呢。