Clojure:Spark Graphx的Scala/Java互操作问题

Clojure:Spark Graphx的Scala/Java互操作问题,java,scala,clojure,spark-graphx,Java,Scala,Clojure,Spark Graphx,我正在尝试使用Clojure&使用Spark/GraphX 以下是我最后得到的代码: 在project.clj文件中: (defproject spark-tests "0.1.0-SNAPSHOT" :description "FIXME: write description" :url "http://example.com/FIXME" :license {:name "Eclipse Public License" :url "http://www.e

我正在尝试使用Clojure&使用Spark/GraphX

以下是我最后得到的代码:

project.clj
文件中:

(defproject spark-tests "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [yieldbot/flambo "0.5.0"]]
  :main ^:skip-aot spark-tests.core
  :target-path "target/%s"
  :checksum :warn
  :profiles {:dev {:aot [flambo.function]}
             :uberjar {:aot :all}
             :provided {:dependencies
                        [[org.apache.spark/spark-core_2.10 "1.3.0"]
                         [org.apache.spark/spark-core_2.10 "1.2.0"]
                         [org.apache.spark/spark-graphx_2.10 "1.2.0"]]}})
(ns spark-tests.core  
  (:require [flambo.conf :as conf]
            [flambo.api :as f]
            [flambo.tuple :as ft])
  (:import (org.apache.spark.graphx Edge)
           (org.apache.spark.graphx.impl GraphImpl)))

(defonce c (-> (conf/spark-conf)
               (conf/master "local")
               (conf/app-name "flame_princess")))

(defonce sc (f/spark-context c))

(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
                              (ft/tuple 7 ["jgonzal" "postdoc"])
                              (ft/tuple 5 ["franklin" "prof"])]))

(defn edge
  [source dest attr]
  (new Edge (long source) (long dest) attr))

(def relationships (f/parallelize sc [(edge 3 7 "collab")
                                      (edge 5 3 "advisor")]))

(def g (new GraphImpl users relationships))
然后是我的Clojure
core.clj
文件:

(defproject spark-tests "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [yieldbot/flambo "0.5.0"]]
  :main ^:skip-aot spark-tests.core
  :target-path "target/%s"
  :checksum :warn
  :profiles {:dev {:aot [flambo.function]}
             :uberjar {:aot :all}
             :provided {:dependencies
                        [[org.apache.spark/spark-core_2.10 "1.3.0"]
                         [org.apache.spark/spark-core_2.10 "1.2.0"]
                         [org.apache.spark/spark-graphx_2.10 "1.2.0"]]}})
(ns spark-tests.core  
  (:require [flambo.conf :as conf]
            [flambo.api :as f]
            [flambo.tuple :as ft])
  (:import (org.apache.spark.graphx Edge)
           (org.apache.spark.graphx.impl GraphImpl)))

(defonce c (-> (conf/spark-conf)
               (conf/master "local")
               (conf/app-name "flame_princess")))

(defonce sc (f/spark-context c))

(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
                              (ft/tuple 7 ["jgonzal" "postdoc"])
                              (ft/tuple 5 ["franklin" "prof"])]))

(defn edge
  [source dest attr]
  (new Edge (long source) (long dest) attr))

(def relationships (f/parallelize sc [(edge 3 7 "collab")
                                      (edge 5 3 "advisor")]))

(def g (new GraphImpl users relationships))
当我运行该代码时,出现以下错误:

1. Caused by java.lang.ClassCastException
   Cannot cast org.apache.spark.api.java.JavaRDD to
   scala.reflect.ClassTag

  Class.java: 3258  java.lang.Class/cast
  Reflector.java:  427  clojure.lang.Reflector/boxArg
  Reflector.java:  460  clojure.lang.Reflector/boxArgs
免责声明:我不了解Scala

然后我想这可能是因为当我们使用
f/parallelize
时,
Flambo
返回一个JavaRDD。然后我尝试将JavaRDD转换为GraphX示例中使用的简单RDD:

(def g (new GraphImpl (.rdd users) (.rdd relationships)))
但是对于
ParallelCollectionRDD
类,我得到了相同的错误

从那以后,我就知道是什么导致了这一切。这个

我不清楚的是如何在Clojure中有效地使用该类签名:

org.apache.spark.graphx.Graph<VD,ED>
org.apache.spark.graphx.Graph
(Graph是一个抽象类,但我在本例中尝试使用GraphImpl)

我想做的是使用Clojure

任何提示都将不胜感激

我想,我终于做对了。以下是似乎正在工作的代码:

(ns spark-tests.core
  (:require [flambo.conf :as conf]
            [flambo.api :as f]
            [flambo.tuple :as ft])
  (:import (org.apache.spark.graphx Edge
                                    Graph)
           (org.apache.spark.api.java JavaRDD
                                      StorageLevels)
           (scala.reflect ClassTag$)))

(defonce c (-> (conf/spark-conf)
               (conf/master "local")
               (conf/app-name "flame_princess")))

(defonce sc (f/spark-context c))

(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
                              (ft/tuple 7 ["jgonzal" "postdoc"])
                              (ft/tuple 5 ["franklin" "prof"])]))

(defn edge
  [source dest attr]
  (new Edge (long source) (long dest) attr))

(def relationships (f/parallelize sc [(edge 3 7 "collab")
                                      (edge 5 3 "advisor")
                                      (edge 7 3 "advisor")]))


(def g (Graph/apply (.rdd users)
                    (.rdd relationships)
                    "collab"
                    (StorageLevels/MEMORY_ONLY)
                    (StorageLevels/MEMORY_ONLY)
                    (.apply ClassTag$/MODULE$ clojure.lang.PersistentVector)
                    (.apply ClassTag$/MODULE$ java.lang.String)))

(println (.count (.edges g)))
此代码返回的是
3
,这似乎是准确的。主要问题是我没有使用
Graph/Apply
创建类。事实上,这似乎是创建所有对象的方法(看起来是构造函数…)。我不知道为什么会这样,但这可能是因为我缺乏Scala知识。如果有人知道,请告诉我原因:)

之后,我只需填写
apply
函数签名的空白

需要注意的是最后两个参数:

  • scala.reflect.ClassTag证据$17
  • scala.reflect.ClassTag证据$18

这用于指示
顶点属性类型(VD)和
边缘属性类型(ED)的Scala。
ED
的类型是用作
Edge
类的第三个参数的对象的类型。然后,
VD
的类型是
tuple
函数的第二个参数的类型。

我不知道有多少相关信息,但根据我所做的一些Clojure/Java互操作实验,这听起来像是一个小的语法错误,或者是对应该提供什么数据类型的误解。stackdump是否包含代码中的行号?@Mars谢谢您的评论。我用完整的stacktrace更新了这个问题,但实际上没有那么多。我不确定ClassTag的作用是什么(对Scala来说似乎非常通用),以及为什么
图形
接受JavaRRD以外的东西(如果是,还有什么其他类型的RRD?)