Google bigquery 数据流管道,带有;更新;标记失败,错误为“0”;改组/GroupByKey“;
我当前的代码从pubsub读取并对其应用过滤器,然后写入bigQuery表。 代码如下Google bigquery 数据流管道,带有;更新;标记失败,错误为“0”;改组/GroupByKey“;,google-bigquery,google-cloud-dataflow,dataflow,Google Bigquery,Google Cloud Dataflow,Dataflow,我当前的代码从pubsub读取并对其应用过滤器,然后写入bigQuery表。 代码如下 public class BeaconAnomalyDetectionPipeline { public static void main(String[] args) { BeaconAnomalyDetectionOptions options = PipelineOptionsFactory.fromArgs(args)
public class BeaconAnomalyDetectionPipeline {
public static void main(String[] args) {
BeaconAnomalyDetectionOptions options =
PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(BeaconAnomalyDetectionOptions.class);
options.setJobName("test-name");
run(options);
}
public static PipelineResult run(BeaconAnomalyDetectionOptions options) {
Pipeline p = Pipeline.create(options);
p.getCoderRegistry().registerCoderForType(TypeDescriptor.of(String.class),
StringUtf8Coder.of());
PCollection<IngestionRequest> ingestionRequests = p.
apply("ReadPubSubSubscription",
PubsubIO.readMessages()
.fromSubscription(options.getSubscriberId()))
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(options.getWindowSize()))))
PCollection<IngestionRequest> anomalies =
ingestionRequests.apply(
"filter by Signature",
Filter.by(ingestionRequest -> ingestionRequest.getCompressionTypeValue()%2!=0));
anomalies
.apply(
"WriteAnomalyToBQ",
BQWriteTransform.newBuilder()
.setTableSpec(options.getTableSpec())
.setMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
.build());
return p.run();
}
}
我已经更新了我的代码以指定编码者,并添加了分步重新洗牌和groupByKey,但仍然看到相同的问题
更新代码如下:
public class BeaconAnomalyDetectionPipeline {
public static void main(String[] args) {
BeaconAnomalyDetectionOptions options =
PipelineOptionsFactory.fromArgs(args)
.withValidation()
.as(BeaconAnomalyDetectionOptions.class);
options.setJobName("test-name");
run(options);
}
public static PipelineResult run(BeaconAnomalyDetectionOptions options) {
Pipeline p = Pipeline.create(options);
p.getCoderRegistry().registerCoderForType(TypeDescriptor.of(String.class),
StringUtf8Coder.of());
PCollection<IngestionRequest> ingestionRequests = p.
apply("ReadPubSubSubscription",
PubsubIO.readMessages()
.fromSubscription(options.getSubscriberId()))
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(options.getWindowSize()))))
.apply(WithKeys.of(input -> 1)).setCoder(KvCoder.of(VarIntCoder.of(), PubsubMessageWithAttributesCoder.of()))
.apply(Reshuffle.of())
.apply(GroupByKey.<Integer, PubsubMessage>create())
.apply(ParDo.of(new Combiner()))
.apply("filter by compression type new", MapElements.via(new SimpleFunction<KV<Integer, PubsubMessage>, PubsubMessage>() {
public PubsubMessage apply(KV<Integer, PubsubMessage> input) {
if (input.getKey()%2!=0) {
return input.getValue();
}else {
return null;
}
}
}))
.apply("PubSubMessagesToTableRows",
new PubsubProtoToIngestionRequest());
ingestionRequests.apply(
"WriteAnomalyToBQ",
BQWriteTransform.newBuilder()
.setTableSpec(options.getTableSpec())
.setMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
.build());
return p.run();
}
}
我在更新脚本中使用了transformNameMapping
--update \
--transformNameMapping='{\"Reshuffle/GroupBykey\":\"\",\"filter by compression type/MapElements\":\"\",\"\":\"filter by Signature\"}' \
--jobName=test-name "
有人能帮我找到一个有效的解决方案吗?非常感谢
The new job is missing steps GroupByKey, Reshuffle/GroupByKey. If these steps have been renamed or deleted, please specify them with the update command.
--update \
--transformNameMapping='{\"Reshuffle/GroupBykey\":\"\",\"filter by compression type/MapElements\":\"\",\"\":\"filter by Signature\"}' \
--jobName=test-name "