Google bigquery 如何在不清除bigquery表的情况下更新在app engine中运行的google云数据流
我有一个Google bigquery 如何在不清除bigquery表的情况下更新在app engine中运行的google云数据流,google-bigquery,google-cloud-dataflow,Google Bigquery,Google Cloud Dataflow,我有一个googleclouddataflow进程在appengine上运行。 它监听通过pubsub发送的消息,并将消息流传输到big query 我更新了代码,正在尝试重新运行应用程序 但我收到了这个错误: Exception in thread "main" java.lang.IllegalArgumentException: BigQuery table is not empty 是否仍然可以在不删除表的情况下更新数据流? 因为我的代码可能经常更改,我不想删除表中的数据 这是我的密码
googleclouddataflow
进程在appengine
上运行。
它监听通过pubsub
发送的消息,并将消息流传输到big query
我更新了代码,正在尝试重新运行应用程序
但我收到了这个错误:
Exception in thread "main" java.lang.IllegalArgumentException: BigQuery table is not empty
是否仍然可以在不删除表的情况下更新数据流?
因为我的代码可能经常更改,我不想删除表中的数据
这是我的密码:
public class MyPipline {
private static final Logger LOG = LoggerFactory.getLogger(BotPipline.class);
private static String name;
public static void main(String[] args) {
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("a").setType("string"));
fields.add(new TableFieldSchema().setName("b").setType("string"));
fields.add(new TableFieldSchema().setName("c").setType("string"));
TableSchema tableSchema = new TableSchema().setFields(fields);
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject("my-data-analysis");
options.setStagingLocation("gs://my-bucket/dataflow-jars");
options.setStreaming(true);
Pipeline pipeline = Pipeline.create(options);
PCollection<String> input = pipeline
.apply(PubsubIO.Read.subscription(
"projects/my-data-analysis/subscriptions/myDataflowSub"));
input.apply(ParDo.of(new DoFn<String, Void>() {
@Override
public void processElement(DoFn<String, Void>.ProcessContext c) throws Exception {
LOG.info("json" + c.element());
}
}));
String fileName = UUID.randomUUID().toString().replaceAll("-", "");
input.apply(ParDo.of(new DoFn<String, String>() {
@Override
public void processElement(DoFn<String, String>.ProcessContext c) throws Exception {
JSONObject firstJSONObject = new JSONObject(c.element());
firstJSONObject.put("a", firstJSONObject.get("a").toString()+ "1000");
c.output(firstJSONObject.toString());
}
}).named("update json")).apply(ParDo.of(new DoFn<String, TableRow>() {
@Override
public void processElement(DoFn<String, TableRow>.ProcessContext c) throws Exception {
JSONObject json = new JSONObject(c.element());
TableRow row = new TableRow().set("a", json.get("a")).set("b", json.get("b")).set("c", json.get("c"));
c.output(row);
}
}).named("convert json to table row"))
.apply(BigQueryIO.Write.to("my-data-analysis:mydataset.mytable").withSchema(tableSchema)
);
pipeline.run();
}
}
公共类MyPipline{
私有静态最终记录器LOG=LoggerFactory.getLogger(BotPipline.class);
私有静态字符串名;
公共静态void main(字符串[]args){
列表字段=新的ArrayList();
add(new TableFieldSchema().setName(“a”).setType(“string”));
add(新的TableFieldSchema().setName(“b”).setType(“string”));
add(new TableFieldSchema().setName(“c”).setType(“string”));
TableSchema TableSchema=新TableSchema().设置字段(字段);
DataflowPipelineOptions=PipelineOptions工厂.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
选项。setProject(“我的数据分析”);
options.setStagingLocation(“gs://my bucket/dataflow jars”);
选项。设置流(true);
Pipeline=Pipeline.create(选项);
PCollection输入=管道
.apply(publisubio.Read.subscription(
“项目/我的数据分析/订阅/myDataflowSub”);
input.apply(ParDo.of(new DoFn)(){
@凌驾
public void processElement(DoFn.ProcessContext c)引发异常{
LOG.info(“json”+c.element());
}
}));
字符串文件名=UUID.randomUUID().toString().replaceAll(“-”,”);
input.apply(ParDo.of(new DoFn)(){
@凌驾
public void processElement(DoFn.ProcessContext c)引发异常{
JSONObject firstJSONObject=新的JSONObject(c.element());
firstJSONObject.put(“a”,firstJSONObject.get(“a”).toString()+“1000”);
c、 输出(firstJSONObject.toString());
}
}).named(“updatejson”).apply(ParDo.of(new DoFn)(){
@凌驾
public void processElement(DoFn.ProcessContext c)引发异常{
JSONObject json=新的JSONObject(c.element());
TableRow row=new TableRow().set(“a”,json.get(“a”)).set(“b”,json.get(“b”).set(“c”,json.get(“c”);
c、 输出(行);
}
}).named(“将json转换为表行”))
.apply(BigQueryIO.Write.to)(“我的数据分析:mydataset.mytable”).withSchema(tableSchema)
);
pipeline.run();
}
}
您需要在BigQueryIO.Write上指定的writedisposition
-请参阅文档和。根据您的要求,您需要WRITE\u TRUNCATE
或WRITE\u APPEND
您需要在BigQueryIO上指定带writedisposition
。WRITE
-请参阅文档和。根据您的需求,您需要WRITE\u TRUNCATE
或WRITE\u APPEND