Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/excel/23.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
spark excel数据类型问题_Excel_Apache Spark_Apache Spark Sql_Apache Poi_Spark Excel - Fatal编程技术网

spark excel数据类型问题

spark excel数据类型问题,excel,apache-spark,apache-spark-sql,apache-poi,spark-excel,Excel,Apache Spark,Apache Spark Sql,Apache Poi,Spark Excel,我使用的是使用spark 2.2处理ms excel文件的软件包。一些文件无法作为spark数据帧加载,以下为异常。如果有人遇到此问题,您能否帮助解决此类数据类型问题 在分析之后,我发现如果列名不是字符串,它最终会给出下面的异常,如果我手动将列名从整数更改为字符串,则效果很好 代码: val excelDF = spark.read. format("com.crealytics.spark.excel"). option("useHeader", "true").

我使用的是使用spark 2.2处理ms excel文件的软件包。一些文件无法作为spark数据帧加载,以下为异常。如果有人遇到此问题,您能否帮助解决此类数据类型问题

在分析之后,我发现如果列名不是字符串,它最终会给出下面的异常,如果我手动将列名从整数更改为字符串,则效果很好

代码:

  val excelDF = spark.read.
    format("com.crealytics.spark.excel").
    option("useHeader", "true").
    option("treatEmptyValuesAsNulls", "true").
    option("inferSchema", "true").
    option("addColorColumns", "False").
    option("sheetName", sheetName).
    load(filePath)
java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell
    at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1077)
    at org.apache.poi.xssf.usermodel.XSSFCell.getRichStringCellValue(XSSFCell.java:395)
    at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:347)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply(ExcelRelation.scala:206)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply(ExcelRelation.scala:205)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:205)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:204)
    at scala.Option.getOrElse(Option.scala:121)
    at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:204)
    at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:91)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:37)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:14)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
例外情况:

  val excelDF = spark.read.
    format("com.crealytics.spark.excel").
    option("useHeader", "true").
    option("treatEmptyValuesAsNulls", "true").
    option("inferSchema", "true").
    option("addColorColumns", "False").
    option("sheetName", sheetName).
    load(filePath)
java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell
    at org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1077)
    at org.apache.poi.xssf.usermodel.XSSFCell.getRichStringCellValue(XSSFCell.java:395)
    at org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:347)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply(ExcelRelation.scala:206)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply(ExcelRelation.scala:205)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:205)
    at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:204)
    at scala.Option.getOrElse(Option.scala:121)
    at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:204)
    at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:91)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:37)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:14)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
java.lang.IllegalStateException:无法从数字单元格获取字符串值
位于org.apache.poi.xssf.usermodel.XSSFCell.typeMismatch(XSSFCell.java:1077)
位于org.apache.poi.xssf.usermodel.XSSFCell.getRichStringCellValue(XSSFCell.java:395)
位于org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue(XSSFCell.java:347)
在com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply上(ExcelRelation.scala:206)
在com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1$$anonfun$10.apply上(ExcelRelation.scala:205)
在scala.collection.TraversableLike$$anonfun$map$1.apply处(TraversableLike.scala:234)
在scala.collection.TraversableLike$$anonfun$map$1.apply处(TraversableLike.scala:234)
位于scala.collection.Iterator$class.foreach(Iterator.scala:893)
位于scala.collection.AbstractIterator.foreach(迭代器.scala:1336)
位于scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
位于scala.collection.AbstractIterable.foreach(Iterable.scala:54)
位于scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
位于scala.collection.AbstractTraversable.map(Traversable.scala:104)
在com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply上(ExcelRelation.scala:205)
在com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply上(ExcelRelation.scala:204)
位于scala.Option.getOrElse(Option.scala:121)
在com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:204)上
在com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:91)
位于com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:37)
在com.crealytics.spark.excel.DefaultSource.createRelation上(DefaultSource.scala:14)
在com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)上
位于org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)

对于这个问题,可能有一个更优雅的答案,我会将此作为评论发布,但没有必要的声誉

我总是试图确保我的列标题是字符串


另外,作为一项规则,我在列标题中没有数字字符,我们有一个简单的脚本,可以将数字替换为字母字符(即1和1)。

新版本的
com.crealytics:spark-excel_2.11:0.12.5
库也适用于非字符串列/标题名称