Apache spark 运行Apache.spark时出现NullPointerException
我试图通过红移运行一个查询以提取到数据帧中,同样的查询在spark 2.0.2上工作,但是由于DataRicks不支持这个旧版本,我转到spark 2.2.1,我在新环境中遇到以下异常Apache spark 运行Apache.spark时出现NullPointerException,apache-spark,nullpointerexception,Apache Spark,Nullpointerexception,我试图通过红移运行一个查询以提取到数据帧中,同样的查询在spark 2.0.2上工作,但是由于DataRicks不支持这个旧版本,我转到spark 2.2.1,我在新环境中遇到以下异常 def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = { // get schema val schema = df.schema StructType(schema.map { case StructField( c,
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
感谢您的帮助。
简而言之,NullPointerException来自
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
java.lang.NullPointerException位于org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.writeUnsafeRowWriter.java:210 at
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
我试图禁用sparkConf.setspark.sql.codegen.whitestage,也禁用了false,但仍然不起作用。
有人知道如何解决这个问题吗
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
驱动程序堆栈跟踪:
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
由java.lang.NullPointerException引起:
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
当我将spark.sql.codegen.whisttage设置为false时,我得到另一个NullPointerException:
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
是的,我遇到过,你遇到过同样的问题吗
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
以下是解决方案:
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
}
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
def extractNullableDatasql:String:DataFrame={
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})
logger.info(s"Extracting data from ${source.conf} with sql:\n$sql")
val tempS3Dir = "s3n://data-platform-temp/tmp/redshift_extract"
val origDf =
context
.read
.format("com.databricks.spark.redshift")
.option("forward_spark_s3_credentials", true)
.option("url", source.jdbcUrlWPass)
.option("jdbcdriver", source.driver)
.option("autoenablessl", "false")
.option("tempdir", tempS3Dir)
.option("query", sql)
.load()
context.read
.format("com.databricks.spark.redshift")
.option("forward_spark_s3_credentials", true)
.option("url", source.jdbcUrlWPass)
.option("jdbcdriver", source.driver)
.option("autoenablessl", "false")
.schema(setNullableStateForAllColumns(origDf, true))
.option("tempdir", tempS3Dir)
.option("query", sql)
.load()
}您有没有找到解决这个问题的方法?是的,我有,您遇到过同样的问题吗?有没有更好的方法来获取origDf的模式?编写两次相同的代码只是为了获得DF的模式,以便可以用于下一次加载,这似乎是多余的。我指的是上面的context.read.format.load
def setNullableStateForAllColumns( df: DataFrame, nullable: Boolean) = {
// get schema
val schema = df.schema
StructType(schema.map {
case StructField( c, t, _, m) ⇒ StructField( c, t, nullable = nullable, m)
})