Clojure:Spark Graphx的Scala/Java互操作问题
我正在尝试使用Clojure&使用Spark/GraphX 以下是我最后得到的代码: 在Clojure:Spark Graphx的Scala/Java互操作问题,java,scala,clojure,spark-graphx,Java,Scala,Clojure,Spark Graphx,我正在尝试使用Clojure&使用Spark/GraphX 以下是我最后得到的代码: 在project.clj文件中: (defproject spark-tests "0.1.0-SNAPSHOT" :description "FIXME: write description" :url "http://example.com/FIXME" :license {:name "Eclipse Public License" :url "http://www.e
project.clj
文件中:
(defproject spark-tests "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.6.0"]
[yieldbot/flambo "0.5.0"]]
:main ^:skip-aot spark-tests.core
:target-path "target/%s"
:checksum :warn
:profiles {:dev {:aot [flambo.function]}
:uberjar {:aot :all}
:provided {:dependencies
[[org.apache.spark/spark-core_2.10 "1.3.0"]
[org.apache.spark/spark-core_2.10 "1.2.0"]
[org.apache.spark/spark-graphx_2.10 "1.2.0"]]}})
(ns spark-tests.core
(:require [flambo.conf :as conf]
[flambo.api :as f]
[flambo.tuple :as ft])
(:import (org.apache.spark.graphx Edge)
(org.apache.spark.graphx.impl GraphImpl)))
(defonce c (-> (conf/spark-conf)
(conf/master "local")
(conf/app-name "flame_princess")))
(defonce sc (f/spark-context c))
(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
(ft/tuple 7 ["jgonzal" "postdoc"])
(ft/tuple 5 ["franklin" "prof"])]))
(defn edge
[source dest attr]
(new Edge (long source) (long dest) attr))
(def relationships (f/parallelize sc [(edge 3 7 "collab")
(edge 5 3 "advisor")]))
(def g (new GraphImpl users relationships))
然后是我的Clojurecore.clj
文件:
(defproject spark-tests "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.6.0"]
[yieldbot/flambo "0.5.0"]]
:main ^:skip-aot spark-tests.core
:target-path "target/%s"
:checksum :warn
:profiles {:dev {:aot [flambo.function]}
:uberjar {:aot :all}
:provided {:dependencies
[[org.apache.spark/spark-core_2.10 "1.3.0"]
[org.apache.spark/spark-core_2.10 "1.2.0"]
[org.apache.spark/spark-graphx_2.10 "1.2.0"]]}})
(ns spark-tests.core
(:require [flambo.conf :as conf]
[flambo.api :as f]
[flambo.tuple :as ft])
(:import (org.apache.spark.graphx Edge)
(org.apache.spark.graphx.impl GraphImpl)))
(defonce c (-> (conf/spark-conf)
(conf/master "local")
(conf/app-name "flame_princess")))
(defonce sc (f/spark-context c))
(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
(ft/tuple 7 ["jgonzal" "postdoc"])
(ft/tuple 5 ["franklin" "prof"])]))
(defn edge
[source dest attr]
(new Edge (long source) (long dest) attr))
(def relationships (f/parallelize sc [(edge 3 7 "collab")
(edge 5 3 "advisor")]))
(def g (new GraphImpl users relationships))
当我运行该代码时,出现以下错误:
1. Caused by java.lang.ClassCastException
Cannot cast org.apache.spark.api.java.JavaRDD to
scala.reflect.ClassTag
Class.java: 3258 java.lang.Class/cast
Reflector.java: 427 clojure.lang.Reflector/boxArg
Reflector.java: 460 clojure.lang.Reflector/boxArgs
免责声明:我不了解Scala
然后我想这可能是因为当我们使用f/parallelize
时,Flambo
返回一个JavaRDD。然后我尝试将JavaRDD转换为GraphX示例中使用的简单RDD:
(def g (new GraphImpl (.rdd users) (.rdd relationships)))
但是对于ParallelCollectionRDD
类,我得到了相同的错误
从那以后,我就知道是什么导致了这一切。这个
我不清楚的是如何在Clojure中有效地使用该类签名:
org.apache.spark.graphx.Graph<VD,ED>
org.apache.spark.graphx.Graph
(Graph是一个抽象类,但我在本例中尝试使用GraphImpl)
我想做的是使用Clojure
任何提示都将不胜感激 我想,我终于做对了。以下是似乎正在工作的代码:
(ns spark-tests.core
(:require [flambo.conf :as conf]
[flambo.api :as f]
[flambo.tuple :as ft])
(:import (org.apache.spark.graphx Edge
Graph)
(org.apache.spark.api.java JavaRDD
StorageLevels)
(scala.reflect ClassTag$)))
(defonce c (-> (conf/spark-conf)
(conf/master "local")
(conf/app-name "flame_princess")))
(defonce sc (f/spark-context c))
(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
(ft/tuple 7 ["jgonzal" "postdoc"])
(ft/tuple 5 ["franklin" "prof"])]))
(defn edge
[source dest attr]
(new Edge (long source) (long dest) attr))
(def relationships (f/parallelize sc [(edge 3 7 "collab")
(edge 5 3 "advisor")
(edge 7 3 "advisor")]))
(def g (Graph/apply (.rdd users)
(.rdd relationships)
"collab"
(StorageLevels/MEMORY_ONLY)
(StorageLevels/MEMORY_ONLY)
(.apply ClassTag$/MODULE$ clojure.lang.PersistentVector)
(.apply ClassTag$/MODULE$ java.lang.String)))
(println (.count (.edges g)))
此代码返回的是3
,这似乎是准确的。主要问题是我没有使用Graph/Apply
创建类。事实上,这似乎是创建所有对象的方法(看起来是构造函数…)。我不知道为什么会这样,但这可能是因为我缺乏Scala知识。如果有人知道,请告诉我原因:)
之后,我只需填写apply
函数签名的空白
需要注意的是最后两个参数:
scala.reflect.ClassTag证据$17
scala.reflect.ClassTag证据$18
这用于指示
顶点属性类型(VD)和边缘属性类型(ED)的Scala。ED
的类型是用作Edge
类的第三个参数的对象的类型。然后,VD
的类型是tuple
函数的第二个参数的类型。我不知道有多少相关信息,但根据我所做的一些Clojure/Java互操作实验,这听起来像是一个小的语法错误,或者是对应该提供什么数据类型的误解。stackdump是否包含代码中的行号?@Mars谢谢您的评论。我用完整的stacktrace更新了这个问题,但实际上没有那么多。我不确定ClassTag的作用是什么(对Scala来说似乎非常通用),以及为什么图形接受JavaRRD以外的东西(如果是,还有什么其他类型的RRD?)