Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 运行Apache.spark时出现NullPointerException_Apache Spark_Nullpointerexception - Fatal编程技术网

Apache spark 运行Apache.spark时出现NullPointerException

Apache spark 运行Apache.spark时出现NullPointerException,apache-spark,nullpointerexception,Apache Spark,Nullpointerexception,我试图通过红移运行一个查询以提取到数据帧中,同样的查询在spark 2.0.2上工作,但是由于DataRicks不支持这个旧版本,我转到spark 2.2.1,我在新环境中遇到以下异常 def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = { // get schema val schema = df.schema StructType(schema.map { case StructField( c,

我试图通过红移运行一个查询以提取到数据帧中,同样的查询在spark 2.0.2上工作,但是由于DataRicks不支持这个旧版本,我转到spark 2.2.1,我在新环境中遇到以下异常

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
感谢您的帮助。 简而言之,NullPointerException来自

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
java.lang.NullPointerException位于org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.writeUnsafeRowWriter.java:210 at

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
我试图禁用sparkConf.setspark.sql.codegen.whitestage,也禁用了false,但仍然不起作用。 有人知道如何解决这个问题吗

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
驱动程序堆栈跟踪:

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
由java.lang.NullPointerException引起:

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
当我将spark.sql.codegen.whisttage设置为false时,我得到另一个NullPointerException:

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})

是的,我遇到过,你遇到过同样的问题吗

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
以下是解决方案:

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
}

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
def extractNullableDatasql:String:DataFrame={

def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
logger.info(s"Extracting data from ${source.conf} with sql:\n$sql")

val tempS3Dir = "s3n://data-platform-temp/tmp/redshift_extract"
val origDf = 

context
  .read
  .format("com.databricks.spark.redshift")
  .option("forward_spark_s3_credentials", true)
  .option("url", source.jdbcUrlWPass)
  .option("jdbcdriver", source.driver)
  .option("autoenablessl", "false")
  .option("tempdir", tempS3Dir)
  .option("query", sql)
  .load()

context.read
  .format("com.databricks.spark.redshift")
  .option("forward_spark_s3_credentials", true)
  .option("url", source.jdbcUrlWPass)
  .option("jdbcdriver", source.driver)
  .option("autoenablessl", "false")
  .schema(setNullableStateForAllColumns(origDf, true))
  .option("tempdir", tempS3Dir)
  .option("query", sql)
  .load()

}

您有没有找到解决这个问题的方法?是的,我有,您遇到过同样的问题吗?有没有更好的方法来获取origDf的模式?编写两次相同的代码只是为了获得DF的模式,以便可以用于下一次加载,这似乎是多余的。我指的是上面的context.read.format.load
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema

StructType(schema.map {
  case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})