Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/sharepoint/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud dataflow 嵌套联接导致错误400错误请求_Google Cloud Dataflow - Fatal编程技术网

Google cloud dataflow 嵌套联接导致错误400错误请求

Google cloud dataflow 嵌套联接导致错误400错误请求,google-cloud-dataflow,Google Cloud Dataflow,当执行多个嵌套联接时,我在使用数据流服务时收到错误400错误请求。使用本地管道转轮可以很好地工作。下面是我试图实现的一些示例代码: PipelineOptions pipelineOptions = PipelineOptionsFactory.fromArgs(args).withValidation().as(PipelineOptions.class); Pipeline pipeline = Pipeline.create(pipelineOptions); Da

当执行多个嵌套联接时,我在使用数据流服务时收到错误400错误请求。使用本地管道转轮可以很好地工作。下面是我试图实现的一些示例代码:

    PipelineOptions pipelineOptions = PipelineOptionsFactory.fromArgs(args).withValidation().as(PipelineOptions.class);
    Pipeline pipeline = Pipeline.create(pipelineOptions);
    Datastore datastore = getDatastore(pipelineOptions, DATASET_ID);

    addData(datastore);

    PCollection<KV<Long, DatastoreV1.Entity>> users = pipeline.apply(DatastoreIO.readFrom(DATASET_ID, makeQueryForKind("Entity1")))
            .apply(ParDo.of(new MakeKVFromParent()));
    PCollection<KV<Long, DatastoreV1.Entity>> locations = pipeline.apply(DatastoreIO.readFrom(DATASET_ID, makeQueryForKind("Entity2")))
            .apply(ParDo.of(new MakeKVFromParent()));
    PCollection<KV<Long, DatastoreV1.Entity>> cars = pipeline.apply(DatastoreIO.readFrom(DATASET_ID, makeQueryForKind("Entity3")))
            .apply(ParDo.of(new MakeKVFromParent()));

    TupleTag<DatastoreV1.Entity> carsTag = new TupleTag<DatastoreV1.Entity>();
    PCollection<KV<Long, CoGbkResult>> groupedCars = KeyedPCollectionTuple.of(carsTag, cars)
            .apply(CoGroupByKey.<Long>create());

    TupleTag<CoGbkResult> groupedCarsTag = new TupleTag<CoGbkResult>();
    TupleTag<DatastoreV1.Entity> locationsTag = new TupleTag<DatastoreV1.Entity>();
    PCollection<KV<Long, CoGbkResult>> locationData = KeyedPCollectionTuple.of(groupedCarsTag, groupedCars)
            .and(locationsTag, locations)
            .apply(CoGroupByKey.<Long>create());

    //Comment this block of code to remove the bug.
    TupleTag<CoGbkResult> locationDataTag = new TupleTag<CoGbkResult>();
    TupleTag<DatastoreV1.Entity> usersTag = new TupleTag<DatastoreV1.Entity>();
    PCollection<KV<Long, CoGbkResult>> userData = KeyedPCollectionTuple.of(locationDataTag, locationData)
            .and(usersTag, users)
            .apply(CoGroupByKey.<Long>create());

    //Do some computation on userData
    pipeline.run();
PipelineOptions PipelineOptions=PipelineOptionsFactory.fromArgs(args).withValidation().as(PipelineOptions.class);
管道=Pipeline.create(pipelineOptions);
Datastore Datastore=getDatastore(pipelineOptions,DATASET_ID);
添加数据(数据存储);
PCollection users=pipeline.apply(DatastoreIO.readFrom(DATASET\u ID,makeQueryForKind(“Entity1”))
.apply(ParDo.of(new MakeKVFromParent());
PCollection locations=pipeline.apply(DatastoreIO.readFrom(DATASET\u ID,makeQueryForKind(“Entity2”))
.apply(ParDo.of(new MakeKVFromParent());
PCollection cars=pipeline.apply(DatastoreIO.readFrom(DATASET\u ID,makeQueryForKind(“Entity3”))
.apply(ParDo.of(new MakeKVFromParent());
TupleTag carsTag=新的TupleTag();
PCollection groupedCars=KeyedPCollectionTuple.of(carsTag,cars)
.apply(CoGroupByKey.create());
TupleTag groupedCarsTag=新TupleTag();
TupleTag locationsTag=新TupleTag();
PCollection locationData=KeyedPCollectionTuple.of(groupedCarsTag,groupedCars)
.和(位置Stag,位置)
.apply(CoGroupByKey.create());
//注释这段代码以删除错误。
TupleTag locationDataTag=新TupleTag();
TupleTag usersTag=新TupleTag();
PCollection userData=KeyedPCollectionTuple.of(locationDataTag,locationData)
.和(usersTag,用户)
.apply(CoGroupByKey.create());
//对用户数据进行一些计算
pipeline.run();
基本上我有很多用户。一个用户可以拥有多个地点和多辆汽车。汽车始终连接到特定的位置和用户。我想按用户和位置对汽车进行分组,这样我就知道每个用户的位置以及他在每个位置拥有的汽车。我为每个用户计算这些数据

可以找到一个证明我的问题的工作示例

提交作业时会发生此错误。可以找到提交的作业文件


删除最后一个联接后,作业运行正常。有人知道我做错了什么吗?

谢谢您的报告,也谢谢您提供的精彩示例代码!我们已经追踪到服务中的一个问题,并正在努力解决它。在解决此问题时,您可以通过不重新使用
CoGroupByKeyResult
作为
CoGroupByKey
的输入来避免此问题

具体来说,在这种情况下,执行以下操作将减少
CoGroupByKey
操作的数量,使数据更容易取出,并避免使用
CoGroupByKeyResult
作为
CoGroupByKey
的输入:

TupleTag<DatastoreV1.Entity> carsTag = new TupleTag<DatastoreV1.Entity>();
TupleTag<DatastoreV1.Entity> locationsTag = new TupleTag<DatastoreV1.Entity>();
TupleTag<DatastoreV1.Entity> usersTag = new TupleTag<DatastoreV1.Entity>();

PCollection<KV<Long, CoGbkResult>> usersCars = KeyedPCollectionTuple
    .of(carsTag, cars)
    .and(locationsTag, locations)
    .and(usersTag, users)
    .apply(CoGroupByKey.<Long>create());
// Before (with nested CoGroupByKey)
originalResult.getOnly(locationDataTag).getOnly(groupedCarsTag).getAll(usersTag);

// After (with a single CoGroupByKey)
newResult.getAll(usersTag);