Apache spark 如何使用在GCP之外运行的Spark访问BigQuery
我正在尝试将在私有数据中心上运行的Spark作业与BigQuery连接起来。我已经创建了服务帐户,获得了私有JSON密钥,并获得了对我想要查询的数据集的读取权限。但是,当我尝试与Spark集成时,我接收到Apache spark 如何使用在GCP之外运行的Spark访问BigQuery,apache-spark,google-cloud-platform,apache-spark-sql,google-bigquery,Apache Spark,Google Cloud Platform,Apache Spark Sql,Google Bigquery,我正在尝试将在私有数据中心上运行的Spark作业与BigQuery连接起来。我已经创建了服务帐户,获得了私有JSON密钥,并获得了对我想要查询的数据集的读取权限。但是,当我尝试与Spark集成时,我接收到用户没有数据集xxx:yyy的bigquery.tables.create权限。。我们是否需要创建表权限才能使用BigQuery从表中读取数据 下面是控制台上打印的响应 { "code" : 403, "errors" : [ { &quo
用户没有数据集xxx:yyy的bigquery.tables.create权限。
。我们是否需要创建表权限才能使用BigQuery从表中读取数据
下面是控制台上打印的响应
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Access Denied: Dataset xxx:yyy: User does not have bigquery.tables.create permission for dataset xxx:yyy.",
"reason" : "accessDenied"
} ],
"message" : "Access Denied: Dataset xxx:yyy: User does not have bigquery.tables.create permission for dataset xxx:yyy.",
"status" : "PERMISSION_DENIED"
}
下面是我试图访问BigQuery的Spark代码
object ConnectionTester extends App {
val session = SparkSession.builder()
.appName("big-query-connector")
.config(getConf)
.getOrCreate()
session.read
.format("bigquery")
.option("viewsEnabled", true)
.load("xxx.yyy.table1")
.select("col1")
.show(2)
private def getConf : SparkConf = {
val sparkConf = new SparkConf
sparkConf.setAppName("biq-query-connector")
sparkConf.setMaster("local[*]")
sparkConf.set("parentProject", "my-gcp-project")
sparkConf.set("credentialsFile", "<path to my credentialsFile>")
sparkConf
}
}
对象连接扩展应用程序{
val session=SparkSession.builder()
.appName(“大查询连接器”)
.config(getConf)
.getOrCreate()
session.read
.格式(“bigquery”)
.选项(“viewsEnabled”,true)
.load(“xxx.yyy.table1”)
.选择(“col1”)
.表演(2)
私有def getConf:SparkConf={
val sparkConf=新sparkConf
sparkConf.setAppName(“biq查询连接器”)
sparkConf.setMaster(“本地[*]”)
sparkConf.set(“父项目”、“我的gcp项目”)
sparkConf.set(“credentialsFile”,“”)
斯帕克孔夫
}
}
检查下面的代码
凭证
val credentials = """
| {
| "type": "service_account",
| "project_id": "your project id",
| "private_key_id": "your private_key_id",
| "private_key": "-----BEGIN PRIVATE KEY-----\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n-----END PRIVATE KEY-----\n",
| "client_email": "xxxxx@company.com",
| "client_id": "111111111111111111111111111",
| "auth_uri": "https://accounts.google.com/o/oauth2/auth",
| "token_uri": "https://oauth2.googleapis.com/token",
| "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
| "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxxxx40vvvvvv.iam.gserviceaccount.com"
| }
| """
编码base64
&将其传递给spark conf
def base64(data: String) = {
import java.nio.charset.StandardCharsets
import java.util.Base64
Base64.getEncoder.encodeToString(data.getBytes(StandardCharsets.UTF_8))
}
对于读取常规表,不需要
bigquery.tables.create
权限。但是,您提供的代码示例提示表实际上是一个BigQuery视图。BigQuery视图是逻辑引用,它们不会在服务器端具体化,为了让spark读取它们,首先需要将它们具体化为一个临时表。要创建此临时表,需要具有bigquery.tables.create
权限
spark.conf.set("credentials",base64(credentials))
spark
.read
.options("parentProject","parentProject")
.option("table","dataset.table")
.format("bigquery")
.load()