Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark,sbt软件包--没有可用的类型标签_Scala_Apache Spark_Sbt - Fatal编程技术网

Scala Spark,sbt软件包--没有可用的类型标签

Scala Spark,sbt软件包--没有可用的类型标签,scala,apache-spark,sbt,Scala,Apache Spark,Sbt,我正试图在Spark中构建一个简单的TFIDF矢量器,并编译到jar中进行本地测试。然而,我一直没有得到(Int,String)可用的类型标记。这是我的密码: 包com.valiant.ml import org.apache.spark.sql.SparkSession import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer, StopWordsRemover} object TextClassification { d

我正试图在Spark中构建一个简单的TFIDF矢量器,并编译到jar中进行本地测试。然而,我一直没有得到(Int,String)可用的类型标记。这是我的密码:

包com.valiant.ml

import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer, StopWordsRemover}

object TextClassification
{
  def main(args: Array[String])
  {
    val spark = SparkSession
      .builder
      .appName("TextClassification")
      .getOrCreate()

    val sentenceData = spark.createDataFrame(Seq(
      (0, "Hi I heard about Spark"),
      (0, "I wish Java could use case classes"),
      (1, "Logistic regression models are neat")
    )).toDF("label", "text")

    val tokenizer = new Tokenizer()
      .setInputCol("text")
      .setOutputCol("words")

    val wordsData = tokenizer.transform(sentenceData)

    val remover = new StopWordsRemover()
      .setInputCol("words")
      .setOutputCol("filtered_words")

    val filteredWordsData = remover.transform(wordsData)

    val hashingTF = new HashingTF()
      .setInputCol("filtered_words")
      .setOutputCol("rawFeatures")
      .setNumFeatures(20)

    val featurizedData = hashingTF.transform(filteredWordsData)
    // alternatively, CountVectorizer can also be used to get term frequency vectors

    val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
    val idfModel = idf.fit(featurizedData)

    val rescaledData = idfModel.transform(featurizedData)
    rescaledData.select("features", "label").take(3).foreach(println)

    spark.stop()
  }
}
以下是生成文件:

name := "text-classification"

version := "0.0.1"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.0.0",
  "org.apache.spark" %% "spark-mllib" % "2.0.0",
  "org.apache.spark" %% "spark-catalyst" % "2.0.0"
)
以下是我的错误报告的顶部:

[0m[[0minfo[0m] [0mSet current project to text-classification (in build file:/arete/repos/ml/classification/text/)[0m
[0m[[0minfo[0m] [0mCompiling 1 Scala source to /arete/repos/ml/classification/text/target/scala-2.11/classes...[0m
[0m[[31merror[0m] [0m/arete/repos/ml/classification/text/text.scala:15: No TypeTag available for (Int, String)[0m
[0m[[31merror[0m] [0m    val sentenceData = spark.createDataFrame(Seq([0m
[0m[[31merror[0m] [0m                                            ^[0m
[0m[[31merror[0m] [0m[0m
[0m[[31merror[0m] [0m     while compiling: /arete/repos/ml/classification/text/text.scala[0m
[0m[[31merror[0m] [0m        during phase: typer[0m
[0m[[31merror[0m] [0m     library version: version 2.10.4[0m
[0m[[31merror[0m] [0m    compiler version: version 2.10.4[0m

可以显示生成定义吗?可以显示生成定义吗?