Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Scala中使用CSV文件_Scala_Apache Spark_K Means - Fatal编程技术网

在Scala中使用CSV文件

在Scala中使用CSV文件,scala,apache-spark,k-means,Scala,Apache Spark,K Means,我试图用Scala在ApacheSpark上运行K-means 一切都很顺利,但当我尝试使用cvs文件时,我遇到了这个问题 scala> val censocsv = spark.read.format("csv").option("sep",",").option("inferSchema","true").option("header", "true").load("censodiscapacidad.csv") 2018-10-01 21:58:31 WARN SizeEstima

我试图用Scala在ApacheSpark上运行K-means 一切都很顺利,但当我尝试使用cvs文件时,我遇到了这个问题

scala> val censocsv = spark.read.format("csv").option("sep",",").option("inferSchema","true").option("header", "true").load("censodiscapacidad.csv")
2018-10-01 21:58:31 WARN  SizeEstimator:66 - Failed to check whether UseCompressedOops is set; assuming yes
2018-10-01 21:58:49 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
censocsv: org.apache.spark.sql.DataFrame = [ANIO: int, DELEGACION: double ... 123 more fields]

scala> val kmeans = new KMeans().setK(2).setSeed(1L)
kmeans: org.apache.spark.ml.clustering.KMeans = kmeans_860c02e56190

scala> val model = kmeans.fit(censocsv)
java.lang.IllegalArgumentException: Field "features" does not exist.
  at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)
  at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)
  at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
  at scala.collection.AbstractMap.getOrElse(Map.scala:59)
  at org.apache.spark.sql.types.StructType.apply(StructType.scala:266)
  at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
  at org.apache.spark.ml.clustering.KMeansParams$class.validateAndTransformSchema(KMeans.scala:93)
  at org.apache.spark.ml.clustering.KMeans.validateAndTransformSchema(KMeans.scala:254)
  at org.apache.spark.ml.clustering.KMeans.transformSchema(KMeans.scala:340)
  at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
  at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:305)
  ... 51 elided

scala> val predictions = model.transform(censocsv)
<console>:31: error: not found: value model
       val predictions = model.transform(censocsv)
                         ^

scala> 
scala>val censocsv=spark.read.format(“csv”).option(“sep”、“true”).option(“推断模式”、“true”).option(“header”、“true”).load(“censodiscapacidad.csv”)
2018-10-01 21:58:31警告大小刺激:66-检查是否设置了UseCompressedOops失败;假设是
2018-10-01 21:58:49警告对象存储:568-无法获取数据库全局温度,返回NoSuchObjectException
censocsv:org.apache.spark.sql.DataFrame=[ANIO:int,DELEGACION:double…123更多字段]
scala>val kmeans=new kmeans().setK(2).setSeed(1L)
kmeans:org.apache.spark.ml.clustering.kmeans=kmeans_860c02e56190
scala>val model=kmeans.fit(censocsv)
java.lang.IllegalArgumentException:字段“功能”不存在。
位于org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)
位于org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)
位于scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
位于scala.collection.AbstractMap.getOrElse(Map.scala:59)
位于org.apache.spark.sql.types.StructType.apply(StructType.scala:266)
位于org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
位于org.apache.spark.ml.clustering.KMeansParams$class.validateAndTransferorMschema(KMeans.scala:93)
位于org.apache.spark.ml.clustering.KMeans.validateAndTransferorMschema(KMeans.scala:254)
位于org.apache.spark.ml.clustering.KMeans.transformSchema(KMeans.scala:340)
位于org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
位于org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:305)
... 51删去
scala>val预测=model.transform(censocsv)
:31:错误:未找到:值模型
val预测=模型转换(censocsv)
^
斯卡拉>

这看起来像是

您需要将包含要素列的向量添加到数据框中。

好的,谢谢@BrianMcCutchon