Java Spark JdbcRDD奇怪任务不可序列化异常
使用最新的Spark 1.3.1获取Java Spark JdbcRDD奇怪任务不可序列化异常,java,jdbc,apache-spark,Java,Jdbc,Apache Spark,使用最新的Spark 1.3.1获取任务不可序列化简单jdbcRDD代码段中的异常: public class SparkDriverApp implements Serializable { public static void main(String[] args) throws SQLException, ClassNotFoundException { SparkConf conf = new SparkConf(); conf.setAppNam
任务不可序列化
简单jdbcRDD代码段中的异常:
public class SparkDriverApp implements Serializable {
public static void main(String[] args) throws SQLException, ClassNotFoundException {
SparkConf conf = new SparkConf();
conf.setAppName("com.example.testapp");
JavaSparkContext sc = new JavaSparkContext(conf);
new JdbcRDD<>(sc.sc(), new AbstractFunction0<Connection>() {
@Override
public Connection apply() {
try {
Class.forName("com.mysql.jdbc.Driver");
return DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb", "root", "yetAnotherMyPassword");
} catch (Exception e) {
throw new RuntimeException();
}
}
}, "SELECT document_id, name, content FROM document WHERE document_id >= ? and document_id <= ?",
10001, 499999, 10, new AbstractFunction1<ResultSet, Object[]>() {
@Override
public Object[] apply(ResultSet resultSet) {
return JdbcRDD.resultSetToObjectArray(resultSet);
}
}, ClassManifestFactory$.MODULE$.fromClass(Object[].class)).collect();
}
}
公共类SparkDriverApp实现可序列化{
公共静态void main(字符串[]args)抛出SQLException、ClassNotFoundException{
SparkConf conf=新的SparkConf();
conf.setAppName(“com.example.testapp”);
JavaSparkContext sc=新的JavaSparkContext(conf);
新的JdbcRDD(sc.sc(),新的AbstractFunction0(){
@凌驾
公共连接应用(){
试一试{
Class.forName(“com.mysql.jdbc.Driver”);
返回DriverManager.getConnection(“jdbc:mysql://localhost:3306/mydb“,”root“,”yetAnotherMyPassword“);
}捕获(例外e){
抛出新的RuntimeException();
}
}
},“从文档id>=?和文档id所在的文档中选择文档id、名称、内容
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1622)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1460)
...
Caused by: java.io.NotSerializableException: com.example.testapp.SparkDriverApp$1
Serialization stack:
- object not serializable (class: com.example.testapp.SparkDriverApp$1, value: <function0>)
- field (class: org.apache.spark.rdd.RDD, name: checkpointData, type: class scala.Option)
- object (class org.apache.spark.rdd.JdbcRDD, JdbcRDD[0] at JdbcRDD at SparkDriverApp.java:44)
- field (class: org.apache.spark.rdd.RDD$$anonfun$17, name: $outer, type: class org.apache.spark.rdd.RDD)
- object (class org.apache.spark.rdd.RDD$$anonfun$17, <function1>)
- field (class: org.apache.spark.SparkContext$$anonfun$runJob$5, name: func$1, type: interface scala.Function1)
- object (class org.apache.spark.SparkContext$$anonfun$runJob$5, <function2>)
...