Apache spark 如何使用在GCP之外运行的Spark访问BigQuery_Apache Spark_Google Cloud Platform_Apache Spark Sql_Google Bigquery

Apache spark 如何使用在GCP之外运行的Spark访问BigQuery

apache-spark google-cloud-platform google-bigquery

Apache spark 如何使用在GCP之外运行的Spark访问BigQuery,apache-spark,google-cloud-platform,apache-spark-sql,google-bigquery,Apache Spark,Google Cloud Platform,Apache Spark Sql,Google Bigquery,我正在尝试将在私有数据中心上运行的Spark作业与BigQuery连接起来。我已经创建了服务帐户，获得了私有JSON密钥，并获得了对我想要查询的数据集的读取权限。但是，当我尝试与Spark集成时，我接收到用户没有数据集xxx:yyy的bigquery.tables.create权限。。我们是否需要创建表权限才能使用BigQuery从表中读取数据下面是控制台上打印的响应 { "code" : 403, "errors" : [ { &quo

我正在尝试将在私有数据中心上运行的Spark作业与BigQuery连接起来。我已经创建了服务帐户，获得了私有JSON密钥，并获得了对我想要查询的数据集的读取权限。但是，当我尝试与Spark集成时，我接收到

用户没有数据集xxx:yyy的bigquery.tables.create权限。

。我们是否需要创建表权限才能使用BigQuery从表中读取数据
下面是控制台上打印的响应

{ "code" : 403, "errors" : [ { "domain" : "global", "message" : "Access Denied: Dataset xxx:yyy: User does not have bigquery.tables.create permission for dataset xxx:yyy.", "reason" : "accessDenied" } ], "message" : "Access Denied: Dataset xxx:yyy: User does not have bigquery.tables.create permission for dataset xxx:yyy.", "status" : "PERMISSION_DENIED" }
下面是我试图访问BigQuery的Spark代码

object ConnectionTester extends App { val session = SparkSession.builder() .appName("big-query-connector") .config(getConf) .getOrCreate() session.read .format("bigquery") .option("viewsEnabled", true) .load("xxx.yyy.table1") .select("col1") .show(2) private def getConf : SparkConf = { val sparkConf = new SparkConf sparkConf.setAppName("biq-query-connector") sparkConf.setMaster("local[*]") sparkConf.set("parentProject", "my-gcp-project") sparkConf.set("credentialsFile", "<path to my credentialsFile>") sparkConf } }

对象连接扩展应用程序{ val session=SparkSession.builder（） .appName（“大查询连接器”） .config（getConf） .getOrCreate（） session.read .格式（“bigquery”） .选项（“viewsEnabled”，true） .load（“xxx.yyy.table1”） .选择（“col1”） .表演（2）私有def getConf:SparkConf={ val sparkConf=新sparkConf sparkConf.setAppName（“biq查询连接器”） sparkConf.setMaster（“本地[*]”） sparkConf.set（“父项目”、“我的gcp项目”） sparkConf.set（“credentialsFile”，“”）斯帕克孔夫 } }
检查下面的代码
凭证

val credentials = """ | { | "type": "service_account", | "project_id": "your project id", | "private_key_id": "your private_key_id", | "private_key": "-----BEGIN PRIVATE KEY-----\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n-----END PRIVATE KEY-----\n", | "client_email": "xxxxx@company.com", | "client_id": "111111111111111111111111111", | "auth_uri": "https://accounts.google.com/o/oauth2/auth", | "token_uri": "https://oauth2.googleapis.com/token", | "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", | "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxxxx40vvvvvv.iam.gserviceaccount.com" | } | """
编码
base64
&将其传递给spark conf

def base64(data: String) = { import java.nio.charset.StandardCharsets import java.util.Base64 Base64.getEncoder.encodeToString(data.getBytes(StandardCharsets.UTF_8)) }

对于读取常规表，不需要
bigquery.tables.create
权限。但是，您提供的代码示例提示表实际上是一个BigQuery视图。BigQuery视图是逻辑引用，它们不会在服务器端具体化，为了让spark读取它们，首先需要将它们具体化为一个临时表。要创建此临时表，需要具有
bigquery.tables.create
权限

spark.conf.set("credentials",base64(credentials))

spark .read .options("parentProject","parentProject") .option("table","dataset.table") .format("bigquery") .load()