Scala XGBoost4J Spark错误-对象dmlc不是包org.apache.Spark.ml的成员

Scala XGBoost4J Spark错误-对象dmlc不是包org.apache.Spark.ml的成员,scala,maven,apache-spark,apache-spark-mllib,xgboost,Scala,Maven,Apache Spark,Apache Spark Mllib,Xgboost,我创建了一个Spark Scala项目来测试XGBoost4J Spark。项目成功生成,但运行脚本时出现以下错误: Message: <console>:65: error: object dmlc is not a member of package org.apache.spark.ml import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier Message::65:错误:对象dmlc不是包org.apach

我创建了一个Spark Scala项目来测试XGBoost4J Spark。项目成功生成,但运行脚本时出现以下错误:

Message: <console>:65: error: object dmlc is not a member of package org.apache.spark.ml
       import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
Message::65:错误:对象dmlc不是包org.apache.spark.ml的成员
导入ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
我看到类似的问题,比如在pom.xml文件中添加mllib似乎有效。但是,我已经有了依赖关系,它抛出了错误。请告知

Scala代码:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}

val spark = SparkSession.builder().config("spark.jars","target/accounts-by-state-1.0.jar").getOrCreate()
val schema = new StructType(Array(
  StructField("sepal length", DoubleType, true),
  StructField("sepal width", DoubleType, true),
  StructField("petal length", DoubleType, true),
  StructField("petal width", DoubleType, true),
  StructField("class", StringType, true)))
val rawInput = spark.read.schema(schema).csv("iris.data")


import org.apache.spark.ml.feature.StringIndexer
val stringIndexer = new StringIndexer().
  setInputCol("class").
  setOutputCol("classIndex").
  fit(rawInput)
val labelTransformed = stringIndexer.transform(rawInput).drop("class")

import org.apache.spark.ml.feature.VectorAssembler
val vectorAssembler = new VectorAssembler().
  setInputCols(Array("sepal length", "sepal width", "petal length", "petal width")).
  setOutputCol("features")
val xgbInput = vectorAssembler.transform(labelTransformed).select("features", "classIndex")

import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
val xgbParam = Map("eta" -> 0.1f,
      "missing" -> -999,
      "objective" -> "multi:softprob",
      "num_class" -> 3,
      "num_round" -> 100,
      "num_workers" -> 2)
val xgbClassifier = new XGBoostClassifier(xgbParam).
      setFeaturesCol("features").
      setLabelCol("classIndex")

   Message: <console>:65: error: object dmlc is not a member of package org.apache.spark.ml
           import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
import org.apache.spark.sql.SparkSession
导入org.apache.spark.sql.types.{DoubleType、StringType、StructField、StructType}
val spark=SparkSession.builder().config(“spark.jars”,“target/accounts-by-state-1.0.jar”).getOrCreate()
val schema=新结构类型(数组(
StructField(“萼片长度”,双型,真),
StructField(“萼片宽度”,双型,真),
StructField(“花瓣长度”,双类型,真),
StructField(“花瓣宽度”,双类型,真),
StructField(“类”,StringType,true)))
val rawInput=spark.read.schema(schema.csv)(“iris.data”)
导入org.apache.spark.ml.feature.StringIndexer
val stringIndexer=新的stringIndexer()。
setInputCol(“类”)。
setOutputCol(“classIndex”)。
拟合(原始输入)
val labelTransformed=stringIndexer.transform(rawInput.drop(“类”)
导入org.apache.spark.ml.feature.VectorAssembler
val vectorAssembler=新的vectorAssembler()。
setInputCols(数组(“萼片长度”、“萼片宽度”、“花瓣长度”、“花瓣宽度”))。
setOutputCol(“特性”)
val xgbInput=vectorAssembler.transform(labelTransformed)。选择(“功能”、“classIndex”)
导入ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
val xgbParam=地图(“eta”->0.1f,
“失踪”->-999,
“目标”->“多:softprob”,
“数量级”->3,
“轮数”->100,
“工人人数”->2)
val xgbClassifier=新的XGBoostClassifier(xgbParam)。
setFeaturesCol(“功能”)。
setLabelCol(“classIndex”)
消息::65:错误:对象dmlc不是包org.apache.spark.ml的成员
导入ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier
pom.xml文件内容:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.cloudera.training.devsh</groupId>
  <artifactId>accounts-by-state</artifactId>
  <version>1.0</version>
  <packaging>jar</packaging>
  <name>"Accounts by State"</name>

  <properties>
    <hadoop.version>3.0.0</hadoop.version>
    <spark.version>2.4.0</spark.version>
    <scala.version>2.11.12</scala.version>
    <scala.binary.version>2.11</scala.binary.version>
    <java.version>1.8</java.version>
  </properties>
  
  <repositories>
    <repository>
  <id>XGBoost4J Snapshot Repo</id>
  <name>XGBoost4J Snapshot Repo</name>
  <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/</url>
</repository>
    <repository>
      <id>apache-repo</id>
      <name>Apache Repository</name>
      <url>https://repository.apache.org/content/repositories/releases</url>
      <releases>
        <enabled>true</enabled>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
   <repository>
     <id>cloudera-repo-releases</id>
     <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
   </repository> 
  </repositories>

  <dependencies>
      <dependency> <!-- Scala -->
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
      </dependency>

      <dependency> <!-- Core Spark -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
      </dependency>

      <dependency> <!-- Spark SQL -->
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
      </dependency>
    
      <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>runtime</scope>
      </dependency>

      <dependency> <!-- Hadoop -->
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-client</artifactId>
         <version>${hadoop.version}</version>
       </dependency>
    
    <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j_${scala.binary.version}</artifactId>
      <version>1.1.0-SNAPSHOT</version>
  </dependency>
    
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
      <version>1.1.0-SNAPSHOT</version>
  </dependency>

  </dependencies>

  <build>
      <sourceDirectory>src/main/scala</sourceDirectory>
      <plugins>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>3.3</version>
        </plugin>
        <plugin>
          <groupId>net.alchim31.maven</groupId>
          <artifactId>scala-maven-plugin</artifactId>
          <version>3.2.2</version>
          <executions>
            <execution>
              <goals>
                <goal>compile</goal>
                <goal>testCompile</goal>
              </goals>
            </execution>
          </executions>
          <configuration>
            <args>
              <!-- work-around for https://issues.scala-lang.org/browse/SI-8358 -->
              <arg>-nobootcp</arg>
            </args>
          </configuration>
        </plugin>
      </plugins>
    </build>

</project>

4.0.0
com.cloudera.training.devsh
按州分列的账目
1
罐子
“按国家分列的账户”
3.0.0
2.4.0
2.11.12
2.11
1.8
XGBoost4J快照回购
XGBoost4J快照回购
https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/
阿帕奇回购
Apache存储库
https://repository.apache.org/content/repositories/releases
真的
假的
cloudera回购协议发布
https://repository.cloudera.com/artifactory/cloudera-repos
org.scala-lang
scala图书馆
${scala.version}
org.apache.spark
spark-core{scala.binary.version}
${spark.version}
org.apache.spark
spark-sql{scala.binary.version}
${spark.version}
org.apache.spark
spark-mllib_2.11
${spark.version}
运行时
org.apache.hadoop
hadoop客户端
${hadoop.version}
ml.dmlc
xgboost4j_${scala.binary.version}
1.1.0-快照
ml.dmlc
xgboost4j-spark_${scala.binary.version}
1.1.0-快照
src/main/scala
org.apache.maven.plugins
maven编译器插件
3.3
net.alchim31.maven
scala maven插件
3.2.2
编译
测试编译
-无名氏
编辑:

为了更详细地解释,我在/spark application/XgBoost project/src/main/scala.XgBoost/XgBoost.scala中创建了一个名为XgBoost.scala的文件

import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.feature.StringIndexer
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.types._
import ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier._

//{StringType, LongType, StructField, StructType, DoubleType}

object XgBoost {
  def main(args: Array[String]) {
    if (args.length < 1) {
      System.err.println("Usage: XGBOOST TEST")
      System.exit(1)
    }
 
    val stateCode = args(0)
    
    val spark = SparkSession.builder.getOrCreate()
    spark.sparkContext.setLogLevel("WARN")

    val schema = new StructType(Array(
      StructField("sepal length", DoubleType, true),
      StructField("sepal width", DoubleType, true),
      StructField("petal length", DoubleType, true),
      StructField("petal width", DoubleType, true),
      StructField("class", StringType, true)))
    
    val rawInput = spark.read.schema(schema).csv("~/iris.data")

    val stringIndexer = new StringIndexer().
      setInputCol("class").
      setOutputCol("classIndex").
      fit(rawInput)
    val labelTransformed = stringIndexer.transform(rawInput).drop("class")

    val vectorAssembler = new VectorAssembler().
      setInputCols(Array("sepal length", "sepal width", "petal length", "petal width")).
      setOutputCol("features")
    val xgbInput = vectorAssembler.transform(labelTransformed).select("features", "classIndex")


    val xgbParam = Map("eta" -> 0.1f,
      "missing" -> -999,
      "objective" -> "multi:softprob",
      "num_class" -> 3,
      "num_round" -> 100,
      "num_workers" -> 2)
    val xgbClassifier = new XGBoostClassifier(xgbParam).
      setFeaturesCol("features").
      setLabelCol("classIndex")

    spark.stop
  }
}
import org.apache.spark.sql.SparkSession
导入org.apache.spark.ml.feature.StringIndexer
导入org.apache.spark.ml.feature.VectorAssembler
导入org.apache.spark.sql.types_
导入ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier_
//{StringType、LongType、StructField、StructType、DoubleType}
对象XgBoost{
def main(参数:数组[字符串]){
如果(参数长度<1){
System.err.println(“用法:XGBOOST测试”)
系统出口(1)
}
val stateCode=args(0)
val spark=SparkSession.builder.getOrCreate()
spark.sparkContext.setLogLevel(“警告”)
val schema=新结构类型(数组(
StructField(“萼片长度”,双型,真),
StructField(“萼片宽度”,双型,真),
StructField(“花瓣长度”,双类型,真),
StructField(“花瓣宽度”,双类型,真),
StructField(“类”,StringType,true)))
val rawInput=spark.read.schema(schema.csv)(“~/iris.data”)
val stringIndexer=新的stringIndexer()。
setInputCol(“类”)。
setOutputCol(“classIndex”)。
拟合(原始输入)
val labelTransformed=stringIndexer.transform(rawInput.drop(“类”)
val vectorAssembler=新的vectorAssembler()。
setInputCols(数组(“萼片长度”、“萼片宽度”、“花瓣长度”、“花瓣宽度”))。
setOutputCol(“特性”)
val xgbInput=vectorAssembler.transform(labelTransformed)。选择(“功能”、“classIndex”)
val xgbParam=地图(“eta”->0.1f,
“失踪”->-999,
“目标”->“多:softprob”,
“数量级”->3,
“轮数”->100,
“工人人数”->2)
val xgbClassifier=新的XGBoostClassifier(xgbParam)。
setFeaturesCol(“功能”)。
setLabelCol(“classIndex”)
火花,停止
}
}
然后,我修改了pom.xml文件,如下所示:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.cloudera.xgboost</groupId>
  <artifactId>xgboost-project</artifactId>
  <version>1.0</version>
  <packaging>jar</packaging>
  <name>"XGBoost Project"</name>

  <properties>
    <hadoop.version>3.0.0</hadoop.version>
    <spark.version>2.4.0</spark.version>
    <scala.version>2.11.12</scala.version>
    <scala.binary.version>2.11</scala.binary.version>
    <java.version>1.8</java.version>
  </properties>
  
  <repositories>
    <repository>
  <id>XGBoost4J Snapshot Repo</id>
  <name>XGBoost4J Snapshot Repo</name>
  <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/</url>
</repository>
    <repository>
      <id>apache-repo</id>
      <name>Apache Repository</name>
      <url>https://repository.apache.org/content/repositories/releases</url>
      <releases>
        <enabled>true</enabled>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
   <repository>
     <id>cloudera-repo-releases</id>
     <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
   </repository> 
  </repositories>

  <dependencies>
      <dependency> <!-- Scala -->
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
      </dependency>

      <dependency> <!-- Core Spark -->
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
      </dependency>

      <dependency> <!-- Spark SQL -->
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
      </dependency>
    
      <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_2.11</artifactId>
            <version>${spark.version}</version>
      </dependency>

      <dependency> <!-- Hadoop -->
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-client</artifactId>
         <version>${hadoop.version}</version>
       </dependency>
    
    <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j_${scala.binary.version}</artifactId>
      <version>1.1.0-SNAPSHOT</version>
  </dependency>
    
  <dependency>
      <groupId>ml.dmlc</groupId>
      <artifactId>xgboost4j-spark_${scala.binary.version}</artifactId>
      <version>1.1.0-SNAPSHOT</version>
  </dependency>

  </dependencies>

  <build>
      <sourceDirectory>src/main/scala</sourceDirectory>
      <plugins>
      
        <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>${app.main.class}</mainClass>
                        </manifest>
                    </archive>
                </configuration>
        </plugin>
        
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.3.0</version>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
          <archive>
              <manifest>
                  <mainClass>${app.main.class}</mainClass>
              </manifest>
          </archive>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
        </plugin>
        
        <plugin>
          <groupId>net.alchim31.maven</groupId>
          <artifactId>scala-maven-plugin</artifactId>
          <version>3.2.2</version>
          <executions>
            <execution>
              <goals>
                <goal>compile</goal>
                <goal>testCompile</goal>
              </goals>
            </execution>
          </executions>
          <configuration>
            <args>
              <!-- work-around for https://issues.scala-lang.org/browse/SI-8358 -->
              <arg>-nobootcp</arg>
            </args>
          </configuration>
        </plugin>
        

      </plugins>
    </build>

</project> 

4.0.0
com.cloudera.xgboost
xgboost项目
1
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for com.cloudera.xgboost:xgboost-project:jar:1.0
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-jar-plugin is missing. @ line 89, column 17
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] ----------------< com.cloudera.xgboost:xgboost-project >----------------
[INFO] Building "XGBoost Project" 1.0
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ xgboost-project ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/exercises/spark-application/xgboost-project/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ xgboost-project ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- scala-maven-plugin:3.2.2:compile (default) @ xgboost-project ---
[WARNING]  Expected all dependencies to require Scala version: 2.11.12
[WARNING]  com.cloudera.xgboost:xgboost-project:1.0 requires scala version: 2.11.12
[WARNING]  com.twitter:chill_2.11:0.9.3 requires scala version: 2.11.12
[WARNING]  org.apache.spark:spark-core_2.11:2.4.0 requires scala version: 2.11.12
[WARNING]  org.json4s:json4s-jackson_2.11:3.5.3 requires scala version: 2.11.11
[WARNING] Multiple versions of scala libraries detected!
[INFO] /home/exercises/spark-application/xgboost-project/src/main/scala:-1: info: compiling
[INFO] Compiling 1 source files to /home/cdsw/exercises/spark-application/xgboost-project/target/classes at 1599664489439
[ERROR] /home/exercises/spark-application/xgboost-project/src/main/scala/xgboost/XgBoost.scala:48: error: not found: type XGBoostClassifier
[ERROR]     val xgbClassifier = new XGBoostClassifier(xgbParam).
[ERROR]                             ^
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:08 min
[INFO] Finished at: 2020-09-09T15:14:53Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on project xgboost-project:wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
spark-submit --packages ml.dmlc:xgboost4j-spark_2.11:1.1.0-SNAPSHOT other options