在scala中使用contains-异常_Scala_Apache Spark

在scala中使用contains-异常

scala apache-spark

在scala中使用contains-异常,scala,apache-spark,Scala,Apache Spark,我遇到了这个错误： java.lang.ClassCastException:scala.collection.immutable.$colon$colon不能强制转换为[Ljava.lang.Object；每当我尝试使用“contains”来查找字符串是否在数组中时。是否有更合适的方法来执行此操作？或者，我是否做错了什么？（我对Scala相当陌生）代码如下： val matches = Set[JSONObject]() val config = new SparkConf() val

我遇到了这个错误：


java.lang.ClassCastException:scala.collection.immutable.$colon$colon不能强制转换为[Ljava.lang.Object；

每当我尝试使用“contains”来查找字符串是否在数组中时。是否有更合适的方法来执行此操作？或者，我是否做错了什么？（我对Scala相当陌生）

代码如下：

val matches = Set[JSONObject]()
val config = new SparkConf()
val sc = new SparkContext("local", "SparkExample", config)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val ebay = sqlContext.read.json("/Users/thomassquires/Downloads/products.json")
val catalogue = sqlContext.read.json("/Users/thomassquires/Documents/catalogue2.json")

val eins = ebay.map(item => (item.getAs[String]("ID"), Option(item.getAs[Set[Row]]("itemSpecifics"))))
  .filter(item => item._2.isDefined)
  .map(item => (item._1 , item._2.get.find(x => x.getAs[String]("k") == "EAN")))
  .filter(x => x._2.isDefined)
  .map(x => (x._1, x._2.get.getAs[String]("v")))
  .collect()

    def catEins =  catalogue.map(r => (r.getAs[String]("_id"), Option(r.getAs[Array[String]]("item_model_number")))).filter(r => r._2.isDefined).map(r => (r._1, r._2.get)).collect()

  def matched = for(ein <- eins) yield (ein._1, catEins.filter(z => z._2.contains(ein._2)))

val matches=Set[JSONObject]（）
val config=new SparkConf（）
val sc=新的SparkContext（“本地”、“SparkExample”、配置）
val sqlContext=new org.apache.spark.sql.sqlContext（sc）
val ebay=sqlContext.read.json（“/Users/thomasquires/Downloads/products.json”）
val catalog=sqlContext.read.json（“/Users/thomasquires/Documents/cataloge2.json”）
val eins=ebay.map（item=>（item.getAs[String]（“ID”），Option（item.getAs[Set[Row]]（“itemsspecifics”））
.filter（项=>item.\u 2.isDefined）
.map（item=>（item.\u 1，item.\u 2.get.find（x=>x.getAs[String]（“k”）==“EAN”））
.filter（x=>x.\u 2.isDefined）
.map（x=>（x._1，x._2.get.getAs[String]（“v”））
.collect（）
def catEins=catalog.map（r=>（r.getAs[String]（“\u id”）、Option（r.getAs[Array[String]（“item\u model\u number”）））、filter（r=>r.\u 2.isDefined）.map（r=>（r.\u 1，r.\u 2.get））.collect（）
def matched=for（ein z._2.包含（ein._2）））

异常发生在最后一行。我尝试了几种不同的变体

我的数据结构是一个

List[Tuple2[String，String]]

和一个

List[Tuple2[String，Array[String]]]

。我需要从包含该字符串的第二个列表中找到零个或多个匹配项

感谢您使用了错误的类型。

getAs

实现为

fieldIndex

（

String=>Int

），然后是

get

（

Int=>Any

），然后是

asInstanceOf

）

由于Spark不使用

数组

或

集合

，而是使用

WrappedArray

来存储

数组

列数据，因此像

getAs[array[String]]

或

getAs[Set[Row]

之类的调用无效。如果需要特定类型，应使用

getAs[Seq[t]

或

getAsSeq[t]

并使用

toSet

toArray

将数据转换为所需类型

*请参阅

长话短说（这里仍然有我无法理解的部分*）您使用了错误的类型。

getAs

实现为

fieldIndex

（

String=>Int

），然后是

get

（

Int=>Any

），然后是

asInstanceOf

由于Spark不使用

数组

或

集合

，而是使用

WrappedArray

来存储

数组

列数据，因此像

getAs[array[String]]

或

getAs[Set[Row]

之类的调用无效。如果需要特定类型，应使用

getAs[Seq[t]

或

getAsSeq[t]

并使用

toSet

toArray

将数据转换为所需类型

*请参见

您收集然后筛选是否有任何特定原因？因为理想情况下，您应该始终收集最终结果。主要是为了确定错误。因为它很懒，所以我只在收集时得到错误。我想排除前两个设置中的错误尝试注释所有VAL的类型，这也将帮助其他人推断您的code、顺便问一下，为什么

matched

和

catEins

是

def

s而不是

val

s？你收集然后过滤有什么具体的原因吗？因为理想情况下你应该总是收集最终结果。主要是为了确定错误。因为它很懒，我只在收集时得到错误。我想排除fi上的错误前两个设置尝试注释所有val的类型，这也将帮助其他人对您的代码进行推理。顺便问一下，为什么

匹配的和类别是def
s而不是val
s？