Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用Apache Spark将数据存储导出到GCS存储桶中的数据存储\u备份_Apache Spark_Export_Google Cloud Datastore - Fatal编程技术网

Apache spark 使用Apache Spark将数据存储导出到GCS存储桶中的数据存储\u备份

Apache spark 使用Apache Spark将数据存储导出到GCS存储桶中的数据存储\u备份,apache-spark,export,google-cloud-datastore,Apache Spark,Export,Google Cloud Datastore,我想每天将我的数据存储导出到GCS bucket,格式为datastore\u BACKUP。目前,我正在使用curl命令通过GCP数据存储导出服务进行导出,如下所示: -X POST \ -H "Authorization: Bearer $access_token" \ -H "Content-Type: application/json" \ https://datastore.googleapis.com/v1/projects/viu-data-

我想每天将我的数据存储导出到GCS bucket,格式为datastore\u BACKUP。目前,我正在使用curl命令通过GCP数据存储导出服务进行导出,如下所示:

-X POST \
-H "Authorization: Bearer $access_token" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1/projects/viu-data-warehouse-prod:export \
-d '{
  "labels": {
    "exportVersion": "'"$BUILD_ID"'"
  },
  "outputUrlPrefix": "'"$output_url"'",
  "entityFilter": {
    "namespaceIds": ["customer_one_view"],
    "kinds": ["user_view"]
  },
}') ```


I want it to be done by Apache Spark to make it faster. My Problem is it takes 5 to 6 hrs to finish and as Data is growing it is increasing,

I need suggestion to optimize this process by achieving Parallel processing. I would like to do it via Apache Spark as it is very Fast. Please suggest me how can I do it. 

如果您没有绑定到Spark或特定的导出格式。您可以从中的GCS Apache Beam(数据流)数据存储模板开始,然后根据您的需要进行转换。

感谢@Jim Morrison的建议。