Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud platform 在apachebeam数据流中使用扳手_Google Cloud Platform_Google Cloud Dataflow_Apache Beam_Google Cloud Spanner - Fatal编程技术网

Google cloud platform 在apachebeam数据流中使用扳手

Google cloud platform 在apachebeam数据流中使用扳手,google-cloud-platform,google-cloud-dataflow,apache-beam,google-cloud-spanner,Google Cloud Platform,Google Cloud Dataflow,Apache Beam,Google Cloud Spanner,我试图在ApacheBeamPardo(DoFn)中添加一个扳手连接。我需要查找一些行作为ParDo的一部分。数据流创建了大量worker(通常最多4个),我使用startBundle和finishBundle方法在worker的生命周期中打开和关闭扳手连接。然后在processElement方法中,我使用SingleUserReadOnlyTransaction对传递给DatabaseClient的每个项执行查找 我应该补充一点,这是作为GCP下的数据流运行的 一些代码来说明这一点 priva

我试图在ApacheBeamPardo(DoFn)中添加一个扳手连接。我需要查找一些行作为ParDo的一部分。数据流创建了大量worker(通常最多4个),我使用startBundle和finishBundle方法在worker的生命周期中打开和关闭扳手连接。然后在processElement方法中,我使用SingleUserReadOnlyTransaction对传递给DatabaseClient的每个项执行查找

我应该补充一点,这是作为GCP下的数据流运行的

一些代码来说明这一点

private static CustomDoFn<String, TransactionImport> processRow = new CustomDoFn<String, TransactionImport>(){
    private static final long serialVersionUID = 1L;

    private Spanner spanner = null;
    private DatabaseClient dbClient = null;

    @StartBundle
    public void startBundle(StartBundleContext c){
      TransactionFileOptions options = c.getPipelineOptions().as(TransactionFileOptions.class);

      com.google.cloud.spanner.SpannerOptions spannerOptions = com.google.cloud.spanner.SpannerOptions.newBuilder().build();
      spanner = spannerOptions.getService();
      String spannerProjectID = options.getSpannerProjectId();
      String spannerInstanceID = options.getSpannerInstanceId();
      String spannerDatabaseID = options.getSpannerDatabaseId();

      DatabaseId db = DatabaseId.of(spannerProjectID, spannerInstanceID, spannerDatabaseID);
      dbClient = spanner.getDatabaseClient(db);
    }

    @FinishBundle
    public void finishBundle(FinishBundleContext c){
        spanner.close();  
    }

    @ProcessElement
    public void processElement(DoFn<String, TransactionImport>.ProcessContext c) throws Exception {
    TransactionImport import = new TransactionImport();

    Statement statement = Statement.newBuilder("SELECT * FROM Table1 WHERE Name= @Name")
            .bind("Name").to( text)
            .build();

    ResultSet resultSet = dbClient.singleUseReadOnlyTransaction().executeQuery(statement);

    // set some value  on import dependant on retrieved value

    c.output(import);

}
`


有没有人有在ParDo内使用这种扳手的经验?

我不是扳手专家,但也许我能帮上忙:

  • 您应该使用@Setup/@Teardown来连接和断开扳手@{Start,Finish}Bundle在工作进程的生命周期内被多次调用。有关更多详细信息,请参见此处:

  • processElement方法是否使用
    c.output(…)
    ?否则,beam会认为您的管道卡住了


  • 感谢Igor,使用Setup和Teardown方法的麻烦在于它们不使用上下文参数来传递我的扳手参数。我的processElement确实包含一个c.output,我只是不想在代码段中放太多代码。谢谢again@RichardB请在代码段中包含
    c.output
    ,以避免混淆。c.output已添加到原始帖子中,以便clarification@RichardB这就是为什么SpanRio需要数据库/实例id作为转换参数的原因之一。另一种方法是,您可以根据实例id和数据库id对PCollection中的元素进行分组,然后自己进行绑定。并建立一个连接并在单个
    @ProcessElement
    中运行事务。当处理大批量数据并尝试将数据插入BigQuery时,我遇到了相同的问题。我将尝试流式插入而不是批量插入,并让您知道它是否有效。
    Processing stuck in step Process Rows for at least 05m00s without outputting or completing in state process
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
    at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
    at com.google.common.util.concurrent.Uninterruptibles.takeUninterruptibly(Uninterruptibles.java:233)
    at com.google.cloud.spanner.SessionPool$Waiter.take(SessionPool.java:411)
    at com.google.cloud.spanner.SessionPool$Waiter.access$3300(SessionPool.java:399)
    at com.google.cloud.spanner.SessionPool.getReadSession(SessionPool.java:754)
    at com.google.cloud.spanner.DatabaseClientImpl.singleUseReadOnlyTransaction(DatabaseClientImpl.java:52)
    at com.mycompany.pt.SpannerDataAccess.getBinDetails(SpannerDataAccess.java:197)
    at com.mycompany.pt.transactionFiles.TransactionFileDataflow$1.processLine(TransactionFileDataflow.java:411)
    at com.mycompany.pt.transactionFiles.TransactionFileDataflow$1.processElement(TransactionFileDataflow.java:336)
    at com.mycompany.pt.transactionFiles.TransactionFileDataflow$1$DoFnInvoker.invokeProcessElement(Unknown Source)