Spark在调用scala类方法以逗号分隔字符串时失败
我在spark中的scala shell中有以下CLSSpark在调用scala类方法以逗号分隔字符串时失败,scala,apache-spark,Scala,Apache Spark,我在spark中的scala shell中有以下CLS class StringSplit(val query:String) { def getStrSplit(rdd:RDD[String]):RDD[String]={ rdd.map(x=>x.split(query)) } } 我试图调用这个类中的方法,就像 val inputRDD=sc.parallelize(List("one","two","three")) val strSplit=new StringSplit(
class StringSplit(val query:String)
{
def getStrSplit(rdd:RDD[String]):RDD[String]={
rdd.map(x=>x.split(query))
}
}
我试图调用这个类中的方法,就像
val inputRDD=sc.parallelize(List("one","two","three"))
val strSplit=new StringSplit(",")
strSplit.getStrSplit(inputRDD)
->此步骤失败,出现错误:getStrSplit不是StringSplit的成员
你能告诉我这有什么问题吗?这似乎是合理的做法,但是
的结果类型错误,因为getStrSplit
返回.split
而不是Array[String]
String
- 并行化列表(“一”、“二”、“三”)将导致存储“一”、“二”和“三”,并且没有字符串需要逗号分隔
val input = sc.parallelize(List("1,2,3,4","5,6,7,8"))
input: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[16] at parallelize at <console>
那么原来的问题呢,做同样的操作
在课堂上。我试过:
class StringSplit(query:String){
def get(rdd:RDD[String]) = rdd.map(_.split(query));
}
val ss = StringSplit(",");
ss.get(input);
---> org.apache.spark.SparkException: Task not serializable
我猜发生这种情况是因为类没有序列化到每个worker,而Spark试图发送split函数,但它有一个参数也没有发送
scala> class commaSplitter {
def get(rdd:RDD[String])=rdd.map(_.split(","));
}
defined class commaSplitter
scala> val cs = new commaSplitter;
cs: commaSplitter = $iwC$$iwC$commaSplitter@262f1580
scala> cs.get(input);
res29: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[23] at map at <console>:10
scala> cs.get(input).collect()
res30: Array[Array[String]] = Array(Array(1, 2, 3, 4), Array(5, 6, 7, 8))
这很有效。在类定义中,在
getStrSplit
前面是否有def
或val
关键字?太棒了,保罗……感谢你花时间解释细节……你让我的一天……太棒了。祝你好运。
class StringSplit(query:String){
def get(rdd:RDD[String]) = rdd.map(_.split(query));
}
val ss = StringSplit(",");
ss.get(input);
---> org.apache.spark.SparkException: Task not serializable
scala> class commaSplitter {
def get(rdd:RDD[String])=rdd.map(_.split(","));
}
defined class commaSplitter
scala> val cs = new commaSplitter;
cs: commaSplitter = $iwC$$iwC$commaSplitter@262f1580
scala> cs.get(input);
res29: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[23] at map at <console>:10
scala> cs.get(input).collect()
res30: Array[Array[String]] = Array(Array(1, 2, 3, 4), Array(5, 6, 7, 8))
scala> class stringSplitter(s:String) extends Serializable {
def get(rdd:RDD[String]) = rdd.map(_.split(s));
}
defined class stringSplitter
scala> val ss = new stringSplitter(",");
ss: stringSplitter = $iwC$$iwC$stringSplitter@2a33abcd
scala> ss.get(input)
res33: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[25] at map at <console>:10
scala> ss.get(input).collect()
res34: Array[Array[String]] = Array(Array(1, 2, 3, 4), Array(5, 6, 7, 8))