Google app engine 将Google数据存储备份从数据存储加载到Google BigQuery_Google App Engine_Google Bigquery_Google Cloud Storage_Google Cloud Datastore

Google app engine 将Google数据存储备份从数据存储加载到Google BigQuery

google-app-engine google-bigquery google-cloud-storage

Google app engine 将Google数据存储备份从数据存储加载到Google BigQuery,google-app-engine,google-bigquery,google-cloud-storage,google-cloud-datastore,Google App Engine,Google Bigquery,Google Cloud Storage,Google Cloud Datastore,我们的要求是以编程方式备份Google数据存储，并将这些备份加载到Google Big query以供进一步分析。我们使用以下方法成功地自动化了备份 Queue queue = QueueFactory.getQueue("datastoreBackupQueue"); /* * Create a task which is equivalent to the backup URL mentioned in * above c

我们的要求是以编程方式备份Google数据存储，并将这些备份加载到Google Big query以供进一步分析。我们使用以下方法成功地自动化了备份

        Queue queue = QueueFactory.getQueue("datastoreBackupQueue");

        /*
         * Create a task which is equivalent to the backup URL mentioned in
         * above cron.xml, using new queue which has Datastore admin enabled
         */
        TaskOptions taskOptions = TaskOptions.Builder.withUrl("/_ah/datastore_admin/backup.create")
                .method(TaskOptions.Method.GET).param("name", "").param("filesystem", "gs")
                .param("gs_bucket_name",
                        "db-backup" + "/" + TimeUtils.parseDateToString(new Date(), "yyyy/MMM/dd"))
                .param("queue", queue.getQueueName());

        /*
         * Get list of dynamic entity kind names from the datastore based on
         * the kinds present in the datastore at the start of backup
         */
        List<String> entityNames = getEntityNamesForBackup();
        for (String entityName : entityNames) {
            taskOptions.param("kind", entityName);
        }

        /* Add this task to above queue */
        queue.add(taskOptions);

Queue Queue=QueueFactory.getQueue（“datastoreBackupQueue”）；
/*
*创建与中提到的备份URL等效的任务
*在cron.xml上方，使用启用了数据存储管理的新队列
*/
TaskOptions TaskOptions=TaskOptions.Builder.withUrl（“/\u ah/datastore\u admin/backup.create”）
.method（TaskOptions.method.GET）.param（“名称”，“文件系统”，“gs”）
.param（“gs_bucket_name”，
“数据库备份”+“/”+TimeUtils.parseDateToString（新日期（），“yyyy/MMM/dd”））
.param（“queue”，queue.getQueueName（））；
/*
*根据从数据存储中获取动态实体种类名称列表
*备份开始时数据存储中存在的种类
*/
List entityNames=getEntityNamesForBackup（）；
for（字符串entityName:entityName）{
taskOptions.param（“种类”，entityName）；
}
/*将此任务添加到上面的队列中*/
添加（任务选项）；

然后我可以手动将这些备份导入Google Bigquery，但是我们如何使这个过程自动化呢

我也看了大部分的文档，没有任何帮助

在您问题中提到的基础上，介绍了一些使用命令行、Node.JS或Python从GCS导入的编程示例

您还可以通过在脚本中运行以下命令，将位于云存储上的数据自动导入到BigQuery：

$ gcloud alpha bigquery import SOURCE DESTINATION_TABLE

有关此命令的更多信息，请访问此命令。

我自己已经解决了这个问题，下面是使用JAVA的解决方案下面的代码将从Google云存储中提取备份文件，并将其加载到Google Big Query中

        AppIdentityCredential bqCredential = new AppIdentityCredential(
                Collections.singleton(BigqueryScopes.BIGQUERY));

        AppIdentityCredential dsCredential = new AppIdentityCredential(
                Collections.singleton(StorageScopes.CLOUD_PLATFORM));

        Storage storage = new Storage(HTTP_TRANSPORT, JSON_FACTORY, dsCredential);
        Objects list = storage.objects().list(bucket).setPrefix(prefix).setFields("items/name").execute();

        if (list == null) {
            Log.severe(BackupDBController.class, "BackupToBigQueryController",
                    "List from Google Cloud Storage was null", null);
        } else if (list.isEmpty()) {
            Log.severe(BackupDBController.class, "BackupToBigQueryController",
                    "List from Google Cloud Storage was empty", null);
        } else {

            for (String kind : getEntityNamesForBackup()) {
                Job job = new Job();
                JobConfiguration config = new JobConfiguration();
                JobConfigurationLoad loadConfig = new JobConfigurationLoad();

                String url = "";
                for (StorageObject obj : list.getItems()) {
                    String currentUrl = obj.getName();
                    if (currentUrl.contains(kind + ".backup_info")) {
                        url = currentUrl;
                        break;
                    }
                }

                if (StringUtils.isStringEmpty(url)) {
                    continue;
                } else {
                    url = "gs://"+bucket+"/" + url;
                }

                List<String> gsUrls = new ArrayList<>();
                gsUrls.add(url);

                loadConfig.setSourceUris(gsUrls);
                loadConfig.set("sourceFormat", "DATASTORE_BACKUP");
                loadConfig.set("allowQuotedNewlines", true);

                TableReference table = new TableReference();
                table.setProjectId(projectId);
                table.setDatasetId(datasetId);
                table.setTableId(kind);
                loadConfig.setDestinationTable(table);

                config.setLoad(loadConfig);
                job.setConfiguration(config);

                Bigquery bigquery = new Bigquery.Builder(HTTP_TRANSPORT, JSON_FACTORY, bqCredential)
                        .setApplicationName("BigQuery-Service-Accounts/0.1").setHttpRequestInitializer(bqCredential)
                        .build();
                Insert insert = bigquery.jobs().insert(projectId, job);

                JobReference jr = insert.execute().getJobReference();
                Log.info(BackupDBController.class, "BackupToBigQueryController",
                        "Moving data to BigQuery was successful", null);
            }
        }

AppIdentityCredential bqCredential=新AppIdentityCredential(
Collections.singleton（BigqueryScopes.BIGQUERY））；
AppIdentityCredential dsCredential=新AppIdentityCredential(
单例（StorageScopes.CLOUD_平台）；
存储=新存储（HTTP_传输、JSON_工厂、dsCredential）；
对象列表=storage.Objects（）.list（bucket）.setPrefix（prefix）.setFields（“项目/名称”）.execute（）；
if（list==null）{
Log.severy（BackupDBController.class，“backuptbigquerycontroller”，
“谷歌云存储列表为空”，空）；
}else if（list.isEmpty（））{
Log.severy（BackupDBController.class，“backuptbigquerycontroller”，
“谷歌云存储列表为空”，空）；
}否则{
for（字符串类型：getEntityNamesForBackup（））{
作业=新作业（）；
JobConfiguration配置=新的JobConfiguration（）；
JobConfigurationLoadConfig=新的JobConfigurationLoad（）；
字符串url=“”；
对于（StorageObject对象：list.getItems（））{
字符串currentUrl=obj.getName（）；
if（currentUrl.contains（kind+“.backup_info”））{
url=当前url；
打破
}
}
if（StringUtils.isStringEmpty（url））{
继续；
}否则{
url=“gs://”+bucket+“/”+url；
}
List gsUrls=new ArrayList（）；
添加（url）；
loadConfig.setSourceUris（gsurl）；
loadConfig.set（“sourceFormat”、“DATASTORE_BACKUP”）；
loadConfig.set（“allowQuotedNewlines”，true）；
TableReference table=新的TableReference（）；
表2.setProjectId（projectId）；
表2.setDatasetId（datasetId）；
table.setTableId（种类）；
loadConfig.setDestinationTable（表）；
config.setLoad（loadConfig）；
setConfiguration（配置）；
Bigquery Bigquery=new Bigquery.Builder（HTTP_传输、JSON_工厂、bqCredential）
.setApplicationName（“BigQuery服务帐户/0.1”）.setHttpRequestInitializer（bqCredential）
.build（）；
Insert Insert=bigquery.jobs（）.Insert（projectId，job）；
JobReference jr=insert.execute（）.getJobReference（）；
Log.info（BackupDBController.class，“backuptbigquerycontroller”，
“将数据移动到BigQuery成功”，空）；
}
}

如果有人有更好的方法，请让我知道从上周开始有一种适当的方法可以自动执行此操作。最重要的部分是

gcloud beta数据存储导出

我围绕这一点编写了一个简短的脚本：

你可以调整它以适应你的情况