Google bigquery BigQuery性能和在批处理请求中运行作业

Google bigquery BigQuery性能和在批处理请求中运行作业,google-bigquery,jobs,Google Bigquery,Jobs,我们正在使用Google BigQuery开发一个Java web应用程序,我们面临一个奇怪的行为 我们使用查询作业来检索数据,然后在一些图表中可视化它们 我们创造并增加了几个工作岗位: Job query1Job = startAsyncQuery(query1, "q1"+uuid); jobMapping.put(query1Job.getJobReference().getJobId(), "q1"); runningJobs.add(query1Job); ... Job query2

我们正在使用Google BigQuery开发一个Java web应用程序,我们面临一个奇怪的行为

我们使用查询作业来检索数据,然后在一些图表中可视化它们

我们创造并增加了几个工作岗位:

Job query1Job = startAsyncQuery(query1, "q1"+uuid);
jobMapping.put(query1Job.getJobReference().getJobId(), "q1");
runningJobs.add(query1Job);
...
Job query2Job = startAsyncQuery(query1, "q2"+uuid);
jobMapping.put(query2Job.getJobReference().getJobId(), "q2");
runningJobs.add(query2Job);
...
Job query3Job = startAsyncQuery(query1, "q3"+uuid);
jobMapping.put(query1Job.getJobReference().getJobId(), "q3");
runningJobs.add(query3Job);
...
Job query4Job = startAsyncQuery(query1, "q4"+uuid);
jobMapping.put(query4Job.getJobReference().getJobId(), "q4");
runningJobs.add(query4Job);
...
Job query5Job = startAsyncQuery(query1, "q5"+uuid);
jobMapping.put(query1Job.getJobReference().getJobId(), "q5");
runningJobs.add(query5Job);


 public Job startAsyncQuery(String query, String jobId) throws IOException {
        JobConfigurationQuery queryConfig = new JobConfigurationQuery().setQuery(query).setUseQueryCache(true);
        JobConfiguration config = new JobConfiguration().setQuery(queryConfig);
        Job job = new Job().setId(jobId).setConfiguration(config);
        Job queuedJob = this.bigquery.jobs().insert(this.projectId, job).execute();
        return queuedJob;
    }
我们轮询正在运行的作业列表以检索数据:

  boolean isError = false;
        while (!runningJobs.isEmpty() && !isError) {
            try {
                Thread.sleep(1000);
            } catch (InterruptedException e) {
            }
            List<Job> tempJobs = new ArrayList<Job>();
            for (Job job : runningJobs) {
                JobReference jref = job.getJobReference();
                String jid = jobMapping.get(jref.getJobId());
                int jobState = pollJob(bigQueryManager, jref, jid);
                if (jobState == -1) {
                    System.out.println("Aborting because of error for job " + jid);
                    isError = true;
                } else if (jobState == 1) {
                    List<TableRow> rows = bigQueryManager.getQueryResults(jref);
                    if (jid.startsWith("q1")) {
                        parseQ1QueryResult(filter, metrics, metricsRTLDiv, clientList, objectList, rows);
                    } else if (jid.startsWith("q2")) {
                        parseQ2QueryResult(filter, metrics, rows);
                    } else if (jid.startsWith("q3")) {
                        parseQ3QueryResult(filter, metrics, metricsRTLDiv, rows);
                    } else if (jid.startsWith("q4")) {
                        parseQ4QueryResult(metricsRTLDiv, rows);
                    } else if (jid.startsWith("q5")) {
                        parseQ5QueryResult(metrics, rows);
                    } else {
                        System.out.println("Job finished for unknown id: " + jid);
                    }
                } else {
                    tempJobs.add(job);
                }               
            }
            runningJobs = tempJobs;
        }
是否可以将所有这些查询放在一个批处理请求中,该批处理请求将只有一个HTTP请求(而不是每个查询一个HTTP请求)

当我们需要执行多个查询时,有人知道从BigQuery表检索数据的快速方法吗? 有没有办法提高作业执行速度


谢谢。

Google API客户端库支持Java

虽然这个示例是针对日历服务的,但它可以适用于BigQuery

JsonBatchCallback<Calendar> callback = new JsonBatchCallback<Calendar>() {

  public void onSuccess(Calendar calendar, HttpHeaders responseHeaders) {
    printCalendar(calendar);
    addedCalendarsUsingBatch.add(calendar);
  }

  public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) {
    System.out.println("Error Message: " + e.getMessage());
  }
};

...

Calendar client = Calendar.builder(transport, jsonFactory, credential)
  .setApplicationName("BatchExample/1.0").build();
BatchRequest batch = client.batch();

Calendar entry1 = new Calendar().setSummary("Calendar for Testing 1");
client.calendars().insert(entry1).queue(batch, callback);

Calendar entry2 = new Calendar().setSummary("Calendar for Testing 2");
client.calendars().insert(entry2).queue(batch, callback);

batch.execute();
JsonBatchCallback callback=new JsonBatchCallback(){
成功时公共无效(日历日历、HttpHeaders和responseHeaders){
打印日历(日历);
addedCalendarsUsingBatch.add(日历);
}
失败时公开作废(Google JSonerror e、HttpHeaders负责人){
System.out.println(“错误消息:+e.getMessage());
}
};
...
日历客户端=Calendar.builder(传输、jsonFactory、凭证)
.setApplicationName(“BatchExample/1.0”).build();
BatchRequest batch=client.batch();
Calendar entry1=new Calendar().setSummary(“测试1的日历”);
client.calendars().insert(entry1.queue)(批处理、回调);
Calendar entry2=new Calendar().setSummary(“测试2的日历”);
client.calendars().insert(entry2.queue)(批处理、回调);
batch.execute();

我已经看过这个关于Calendar的示例,但是如果可能的话,我想要一个关于BigQuery的工作示例。Thx。这应该也适用于BigQuery。。。这是Calendar和BigQuery都使用的共享Google API基础设施的一个功能。嗨@JordanTigani,你能给我一个BigQuery的工作示例吗?谢谢。看看这里的例子:关于信息,我用RequestChatching实现了一个解决方案,性能与实际解决方案相同;-)。
JsonBatchCallback<Calendar> callback = new JsonBatchCallback<Calendar>() {

  public void onSuccess(Calendar calendar, HttpHeaders responseHeaders) {
    printCalendar(calendar);
    addedCalendarsUsingBatch.add(calendar);
  }

  public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) {
    System.out.println("Error Message: " + e.getMessage());
  }
};

...

Calendar client = Calendar.builder(transport, jsonFactory, credential)
  .setApplicationName("BatchExample/1.0").build();
BatchRequest batch = client.batch();

Calendar entry1 = new Calendar().setSummary("Calendar for Testing 1");
client.calendars().insert(entry1).queue(batch, callback);

Calendar entry2 = new Calendar().setSummary("Calendar for Testing 2");
client.calendars().insert(entry2).queue(batch, callback);

batch.execute();