Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 断言失败:找不到预定义的架构,也没有拼花数据文件_Scala_Apache Spark Sql - Fatal编程技术网

Scala 断言失败:找不到预定义的架构,也没有拼花数据文件

Scala 断言失败:找不到预定义的架构,也没有拼花数据文件,scala,apache-spark-sql,Scala,Apache Spark Sql,不幸的是,我在Scala和SparkSql方面还有另一个问题。 问题是: Exception in thread "main" java.lang.AssertionError: assertion failed: No predefined schema found, and no Parquet data files or summary files found under file:/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-

不幸的是,我在Scala和SparkSql方面还有另一个问题。 问题是:

Exception in thread "main" java.lang.AssertionError: assertion failed: No predefined schema found, and no Parquet data files or summary files found under file:/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet. 
我使用的是Cloudera虚拟机(Virtual Box environment):这台机器提供了一个集群管理器,安装了一个节点和Cloudera环境,并提供了Spark、Hive、Impala等服务

现在我正试图用SparkSql测试Scala,我有一个无法解决的错误。这是我的代码:

package org.test.spark

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext

object TestSelectAlgorithm {

  def main(args: Array[String]) = {
    val conf = new SparkConf()
      .setAppName("TestSelectAlgorithm")
      .setMaster("local")

    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)

    import sqlContext.implicits._
    import sqlContext._

    val parquetFile = sqlContext.read.parquet("/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet")
    parquetFile.registerTempTable("products")

    val result = sqlContext.sql("select * from default.products")
    parquetFile.show()
  }
}
错误:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/01 01:31:34 INFO SparkContext: Running Spark version 1.6.0
16/07/01 01:31:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/01 01:31:35 INFO SecurityManager: Changing view acls to: cloudera
16/07/01 01:31:35 INFO SecurityManager: Changing modify acls to: cloudera
16/07/01 01:31:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
16/07/01 01:31:36 INFO Utils: Successfully started service 'sparkDriver' on port 57073.
16/07/01 01:31:37 INFO Slf4jLogger: Slf4jLogger started
16/07/01 01:31:37 INFO Remoting: Starting remoting
16/07/01 01:31:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:36679]
16/07/01 01:31:38 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 36679.
16/07/01 01:31:38 INFO SparkEnv: Registering MapOutputTracker
16/07/01 01:31:38 INFO SparkEnv: Registering BlockManagerMaster
16/07/01 01:31:38 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1ad66510-ad8f-4239-b4bf-1410135c84f5
16/07/01 01:31:38 INFO MemoryStore: MemoryStore started with capacity 1619.3 MB
16/07/01 01:31:38 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/01 01:31:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/01 01:31:38 INFO SparkUI: Started SparkUI at http://10.0.2.15:4040
16/07/01 01:31:39 INFO Executor: Starting executor ID driver on host localhost
16/07/01 01:31:39 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45098.
16/07/01 01:31:39 INFO NettyBlockTransferService: Server created on 45098
16/07/01 01:31:39 INFO BlockManagerMaster: Trying to register BlockManager
16/07/01 01:31:39 INFO BlockManagerMasterEndpoint: Registering block manager localhost:45098 with 1619.3 MB RAM, BlockManagerId(driver, localhost, 45098)
16/07/01 01:31:39 INFO BlockManagerMaster: Registered BlockManager
16/07/01 01:31:40 INFO ParquetRelation: Listing file:/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet on driver
Exception in thread "main" java.lang.AssertionError: assertion failed: No predefined schema found, and no Parquet data files or summary files found under file:/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet.
    at scala.Predef$.assert(Predef.scala:179)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$MetadataCache$$readSchema(ParquetRelation.scala:512)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache$$anonfun$12.apply(ParquetRelation.scala:421)
    at scala.Option.orElse(Option.scala:257)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$MetadataCache.refresh(ParquetRelation.scala:421)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache$lzycompute(ParquetRelation.scala:145)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.org$apache$spark$sql$execution$datasources$parquet$ParquetRelation$$metadataCache(ParquetRelation.scala:143)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$6.apply(ParquetRelation.scala:202)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation.dataSchema(ParquetRelation.scala:202)
    at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:636)
    at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:635)
    at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
    at org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:442)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:316)
    at org.test.spark.TestSelectAlgorithm$.main(TestSelectAlgorithm.scala:20)
    at org.test.spark.TestSelectAlgorithm.main(TestSelectAlgorithm.scala)
16/07/01 01:31:40 INFO SparkContext: Invoking stop() from shutdown hook
16/07/01 01:31:40 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
16/07/01 01:31:40 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/01 01:31:40 INFO MemoryStore: MemoryStore cleared
16/07/01 01:31:40 INFO BlockManager: BlockManager stopped
16/07/01 01:31:40 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/01 01:31:40 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/01 01:31:40 INFO SparkContext: Successfully stopped SparkContext
16/07/01 01:31:40 INFO ShutdownHookManager: Shutdown hook called
16/07/01 01:31:40 INFO ShutdownHookManager: Deleting directory /tmp/spark-2e652280-6b19-4bc5-b686-49e1fba5f7e8
但是错误告诉我:
没有找到预定义的模式

有人能帮我吗? 在网络上,更确切地说,在stackoverflow.com上,我发现了一些文章。。但是他们帮不了我

尝试以下路径:

"hdfs:////user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet"

或者尝试向Spark介绍您的Hadoop环境(我不知道如何用Scala编写它,但尝试转换此Java代码):

请尝试以下路径:

"hdfs:////user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet"

或者尝试向Spark介绍您的Hadoop环境(我不知道如何用Scala编写它,但尝试转换此Java代码):



如果指定的路径没有特定于拼花地板的数据或可能为空,则会发生此错误。请在创建数据框之前验证您的文件

如果指定的路径没有特定于拼花地板的数据或可能为空,则会发生此错误。请在创建数据框之前验证您的文件

验证拼花地板是错误的,请使用hive或spark之外的其他拼花地板验证您的拼花地板文件。抱歉,但是我不明白:拼花格式的HDFS中的文件…验证是拼花错误的,只需使用hive或spark以外的其他拼花胎面来验证您的拼花文件。抱歉,但我不明白:拼花格式的HDFS中的文件…现在我尝试了:
val parquetFile=sqlContext.read.parquet("hdfs://quickstart.cloudera:8888/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.拼花地板)
但这不起作用!现在我正在Scala中搜索Hadoop配置。您在哪里执行Scala脚本?直接在Quickstart VM上?还是远程?您能告诉我您使用的spark submit命令吗?直接在虚拟机、Eclipse环境和带有POM.xml配置的Maven项目上。好吧,这很奇怪。下面是文件可能已损坏,请尝试检查它,如下所示:您是否已解决此问题?我也面临同样的问题,尽管我遵循了上述步骤,但它仍不起作用。如果已解决,请解释您所做的更改。现在我已尝试:
val parquetFile=sqlContext.read.parquet(“hdfs://quickstart.cloudera:8888/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.拼花地板)
但这不起作用!现在我正在Scala中搜索Hadoop配置。您在哪里执行Scala脚本?直接在Quickstart VM上?还是远程?您能告诉我您使用的spark submit命令吗?直接在虚拟机、Eclipse环境和带有POM.xml配置的Maven项目上。好吧,这很奇怪。下面是文件可能已损坏,请尝试检查它,如下所示。您是否已解决此问题?我也面临同样的问题,虽然我遵循了上述步骤,但它不起作用。如果已解决,请您解释您所做的更改。如果它因空文件上的断言错误而失败,则似乎是非常可怕的异常处理。是的,同意…could被视为bug如果它在一个空文件上出现断言错误而失败,那么它似乎是非常可怕的异常处理。yup同意…可以被视为bug
/user/hive/warehouse/products/bc223562-ee45-42a6-b9a0-05635efb3e59.parquet
File coreSite = new File("/etc/hadoop/conf/core-site.xml");
File hdfsSite = new File("/etc/hadoop/conf/hdfs-site.xml");
Configuration hConf = sc.hadoopConfiguration();
hConf.addResource(new Path(coreSite.getAbsolutePath()));
hConf.addResource(new Path(hdfsSite.getAbsolutePath()));

SQLContext sqlContext = new SQLContext(sc);