Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/kubernetes/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark “如何修复”;NullPointerException:projectId不能为null;在GKE上的火花应用?_Apache Spark_Kubernetes_Google Cloud Platform_Google Cloud Storage_Google Kubernetes Engine - Fatal编程技术网

Apache spark “如何修复”;NullPointerException:projectId不能为null;在GKE上的火花应用?

Apache spark “如何修复”;NullPointerException:projectId不能为null;在GKE上的火花应用?,apache-spark,kubernetes,google-cloud-platform,google-cloud-storage,google-kubernetes-engine,Apache Spark,Kubernetes,Google Cloud Platform,Google Cloud Storage,Google Kubernetes Engine,我正在将Spark结构化流媒体应用程序部署到Google Kubernetes引擎,在使用gs://URI方案访问bucket时,我面临以下异常: Exception in thread "main" java.lang.NullPointerException: projectId must not be null at com.google.cloud.hadoop.repackaged.gcs.com.google.common.base.Preconditio

我正在将Spark结构化流媒体应用程序部署到Google Kubernetes引擎,在使用
gs://
URI方案访问bucket时,我面临以下异常:

Exception in thread "main" java.lang.NullPointerException: projectId must not be null
    at com.google.cloud.hadoop.repackaged.gcs.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:897)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.createBucket(GoogleCloudStorageImpl.java:437)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorage.createBucket(GoogleCloudStorage.java:88)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.mkdirsInternal(GoogleCloudStorageFileSystem.java:456)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.mkdirs(GoogleCloudStorageFileSystem.java:444)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.mkdirs(GoogleHadoopFileSystemBase.java:911)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2275)
    at org.apache.spark.sql.execution.streaming.StreamExecution.<init>(StreamExecution.scala:137)
    at org.apache.spark.sql.execution.streaming.MicroBatchExecution.<init>(MicroBatchExecution.scala:50)
    at org.apache.spark.sql.streaming.StreamingQueryManager.createQuery(StreamingQueryManager.scala:317)
    at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:359)
    at org.apache.spark.sql.streaming.DataStreamWriter.startQuery(DataStreamWriter.scala:466)
    at org.apache.spark.sql.streaming.DataStreamWriter.startInternal(DataStreamWriter.scala:456)
    at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:301)
    at meetup.SparkStreamsApp$.delayedEndpoint$meetup$SparkStreamsApp$1(SparkStreamsApp.scala:25)
    at meetup.SparkStreamsApp$delayedInit$body.apply(SparkStreamsApp.scala:7)

如何以适当的Kubernetes/GKE方式修复它?

GKE文档中推荐的方法是:

kubectl创建机密的通用spark streaming sa——来自文件=/path/spark-streaming-serviceaccount-key.json
提交作业时,请添加以下配置:

——conf spark.kubernetes.driver.secrets.spark-streaming-sa=
--conf spark.kubernetes.executor.secrets.spark-streaming-sa=
--conf spark.kubernetes.driverEnv.GOOGLE_APPLICATION_CREDENTIALS=/spark-streaming-sa.json
--conf spark.executionv.GOOGLE_应用程序_凭证=/spark-streaming-sa.json
--conf spark.hadoop.google.cloud.auth.service.account.json.keyfile=/spark-streaming-sa.json
您可以参考Github上提供的示例

spark docs的机密管理部分也对此进行了说明:

Kubernetes可用于为Spark提供凭证 访问安全服务的应用程序。要装载用户指定的 将密码放入驱动程序容器中,用户可以使用该配置 窗体的属性
spark.kubernetes.driver.secrets.[SecretName]=
。同样地, 窗体的配置属性
spark.kubernetes.executor.secrets.[SecretName]=
可以 用于将用户指定的机密装载到executor容器中


根据您的配置,我建议您添加以下属性
fs.gs.project.id
,如图所示。因为它显示为
所需。谷歌云项目ID,可访问配置的GCS存储桶

此外,我同意@Blackishop关于秘密管理的声明

./bin/spark-submit \
  --master k8s://$K8S_SERVER \
  --deploy-mode cluster \
  --name $POD_NAME \
  --class meetup.SparkStreamsApp \
  --conf spark.kubernetes.driver.request.cores=400m \
  --conf spark.kubernetes.executor.request.cores=100m \
  --conf spark.kubernetes.container.image=$SPARK_IMAGE \
  --conf spark.kubernetes.driver.pod.name=$POD_NAME \
  --conf spark.kubernetes.namespace=$K8S_NAMESPACE \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.kubernetes.submission.waitAppCompletion=false \
  --conf spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem \
  --conf spark.hadoop.google.cloud.auth.service.account.enable=true \
  --verbose \
  local:///opt/spark/jars/meetup.spark-streams-demo-0.1.0.jar $BUCKET_NAME