Apache spark Spark XML文件加载

Apache spark Spark XML文件加载,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,如何在Spark 2.0中加载XML文件 val rd = spark.read.format("com.databricks.spark.xml").load("C:/Users/kumar/Desktop/d.xml") 我收到错误com.databricks.spark.xml不可用 java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find pac

如何在Spark 2.0中加载XML文件

val rd = spark.read.format("com.databricks.spark.xml").load("C:/Users/kumar/Desktop/d.xml")
我收到错误com.databricks.spark.xml不可用

java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
  at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
  ... 48 elided

ClassNotFoundException意味着您需要一个fat jar,您可以将该包包含在build.sbt中,并通过sbt程序集生成jar。你可以试一试。
如果不能工作。将jar添加到$SPARK_HOME/jars中,然后试一试。

或者,您可以将jar文件添加到SPARK shell中。下载jar文件并复制到spark的类路径中,然后使用:cp命令将jar文件添加到spark shell中

:cp spark-xml_2.10-0.2.0.jar  
/*
  jar file will get imported into the spark shell
  now you can use this jar file anywhere in your code inside the spark shell.
*/
val rd = spark.read.format("com.databricks.spark.xml").load("C:/Users/kumar/Desktop/d.xml")

显然,您没有在类路径中包含该包,但其他包(如com.databicks.spark.csv)正在工作。我正在尝试将其加载到spark shell中。如何在shell中添加jar?请参考此spark shell也可以工作,您可以将您的jar放入$spark/jars和-packages选项中,可以添加xml包$spark_HOME/bin/spark shell-packages com.databricks:spark-xml_2.10:0.4.1希望这会有所帮助