Scala 火花连接函数错误_Scala_Apache Spark_Dataframe_Apache Spark Sql

Scala 火花连接函数错误

scala apache-spark dataframe

Scala 火花连接函数错误,scala,apache-spark,dataframe,apache-spark-sql,Scala,Apache Spark,Dataframe,Apache Spark Sql,我有一个大的数据框，我想加入一个小的csv文件。所以我播放了我的小文件： val rdd = sc.textFile("hdfs:///user/zed/file/app_desc") val id_dec = sc.broadcast(rdd.map(line=>(line.split(";")(0),line.split(";")(1))).collectAsMap) 我做了一个函数来获取id（输入）并返回描述 def extract_connection_type(input:

我有一个大的数据框，我想加入一个小的csv文件。所以我播放了我的小文件：

val rdd = sc.textFile("hdfs:///user/zed/file/app_desc")
val id_dec = sc.broadcast(rdd.map(line=>(line.split(";")(0),line.split(";")(1))).collectAsMap)

我做了一个函数来获取id（输入）并返回描述

def extract_connection_type(input:Integer): String = {
  if (input == null || input.length() == 0)
    input;
  else try {
    id_dec.value.get(input)
  } catch {
     case e: Exception => throw new IOException("UDF:Caught exception processing input row :" + input + e.toString);
  }
}

之后，当我创建我的模式时，我使用这个函数进行连接

def structure(line: String): structure_Ot = {
  val fields = line.split("\\\t",-1);
  val Name1 = fields(0);
  val Name2 = fields(1);
  val Appd = fields(2).toInt;
  val App = extract_connection_type(Appd);
  val ot_str = new structure_Ot(Name1, Name2, App)
  ot_str
}

但我得到了这个错误：

<console>:93: error: type mismatch;
 found   : String
 required: Int

此错误的原因是什么？

您的代码中存在大量类型不匹配，您需要修复这些错误：

id\u dec

的类型为

Broadcast[Map[String，String]]

（因为您创建了一个

RDD[（String，String）]

，然后对结果调用

collectamap

和

Broadcast

）；在

extract\u connection\u type

中，调用

id\u dec.value.get（input）

，其中

input

具有type

Int

，映射键是

字符串。您可以通过将输入的类型更改为字符串
，或者将id_dec
更改为广播[Map[Int，String]
，首先收集并广播RDD[（Int，String）]
；如果选择前者，还必须调整结构
函数，使其传递的是字符串
，而不是Int


使用id\u dec.value.get（input）
的另一个问题是get
返回一个选项[V]
（其中V
是地图值的类型），而不是V
。您可以使用apply
方法（隐式地）：id\u dec.value（input）
，该方法将返回字符串
，如果找不到匹配的键，则引发异常
总之，这是您的代码的编译版本：
val rdd = sc.textFile("hdfs:///user/zed/file/app_desc")
val id_dec: Broadcast[Map[String, String]] = sc.broadcast(rdd.map(line=>(line.split(";")(0),line.split(";")(1))).collectAsMap)

def extract_connection_type(input: String): String = {
  if (input == null || input.length() == 0)
    input
  else try {
    id_dec.value(input)
  } catch {
    case e: Exception => throw new IOException("UDF:Caught exception processing input row :" + input + e.toString);
  }
}

def structure(line: String): structure_Ot = {
  val fields = line.split("\\\t",-1)
  val Name1 = fields(0)
  val Name2 = fields(1)
  val Appd = fields(2)
  val App = extract_connection_type(Appd)
  val ot_str = structure_Ot(Name1, Name2, App.toInt)
  ot_str
}

自己解决这些问题的一个好方法是显式地键入您定义的每个值；这样，您将更清楚地看到错误的位置-如果您希望id\u dec
具有特定类型，则如果存在分配错误类型的问题，则错误将指向其分配。
应用此解决方案后；我有一个空应用程序；我想获取App_描述，它是一个字符串值。我为join.执行此函数。。我希望你能理解，梅解决了我的问题；我一直在structure\u Ot
中将App
声明为int
val rdd = sc.textFile("hdfs:///user/zed/file/app_desc")
val id_dec: Broadcast[Map[String, String]] = sc.broadcast(rdd.map(line=>(line.split(";")(0),line.split(";")(1))).collectAsMap)

def extract_connection_type(input: String): String = {
  if (input == null || input.length() == 0)
    input
  else try {
    id_dec.value(input)
  } catch {
    case e: Exception => throw new IOException("UDF:Caught exception processing input row :" + input + e.toString);
  }
}

def structure(line: String): structure_Ot = {
  val fields = line.split("\\\t",-1)
  val Name1 = fields(0)
  val Name2 = fields(1)
  val Appd = fields(2)
  val App = extract_connection_type(Appd)
  val ot_str = structure_Ot(Name1, Name2, App.toInt)
  ot_str
}