Apache spark 试图将Spark 1.6.X拼花地板读入Spark 2.X的警告_Apache Spark_Parquet

Apache spark 试图将Spark 1.6.X拼花地板读入Spark 2.X的警告

apache-spark

Apache spark 试图将Spark 1.6.X拼花地板读入Spark 2.X的警告,apache-spark,parquet,Apache Spark,Parquet,当尝试将spark 1.6.X拼花地板文件加载到spark 2.X中时，我看到许多WARNlevel语句 16/08/11 12:18:51 WARN CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0 org.apache.parquet.VersionParser$VersionParseExc

当尝试将spark 1.6.X拼花地板文件加载到spark 2.X中时，我看到许多

WARN

level语句

  16/08/11 12:18:51 WARN CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr version 1.6.0
  org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr version 1.6.0 using format: (.+) version ((.*) )?\(build ?(.*)\)
    at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
    at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
    at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
    at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:567)
    at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:544)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:431)
    at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
    at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:107)
    at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:109)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:369)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:343)
    at [rest of stacktrace omitted]

我正在运行2.1.0版本，有大量的警告。除了将日志记录级别更改为ERROR之外，是否有任何方法可以抑制这些错误

这似乎是一个修复的结果-但警告可能还没有被删除。以下是来自JIRA的一些细节：

我已经从PR中构建了代码，它确实成功地读取了数据。我已经试过做df.count（）了，现在我有很多像这样的警告（它们只是不断地被打印出来）终点站）：

将日志记录级别设置为

ERROR

是最后一种方法：它吞噬了我们用于标准监控的消息。有人找到了解决办法吗

目前-即直到/除非此火花/镶木地板缺陷得到修复-我将在

log4j.属性中添加以下内容：
log4j.logger.org.apache.parquet=ERROR   

地点为：

在外部spark服务器上运行时：$spark\u HOME/conf/log4j.properties

在Intellij
（或其他IDE）内部本地运行时：src/main/resources/log4j.properties

您能否提供更多关于更新以下行的详细信息？log4j.logger.org.apache.parquet=错误，即在配置单元log4j.properties中？@Jay。对不起，直到现在我才注意到你的评论。更新了我的答案。