Scala 如何使用Spark将数据流传输到Neo4j
我正在尝试使用Spark将流数据写入Neo4j,但遇到了一些问题(我是Spark的新手) 我已经尝试设置一个字数流,并可以使用自定义ForeachWriter将其写入Postgres,如示例所示。所以我认为我理解基本流程 然后,我尝试复制这一点,并使用Neo4j火花连接器将数据发送到Neo4j。我能够使用齐柏林飞艇笔记本中的示例向Neo4j发送数据。因此,我尝试将此代码传输到ForeachWriter,但我遇到了一个问题—sparkContext在ForeachWriter中不可用,根据我所读到的内容,它不应传入,因为它在驱动程序上运行,而foreach代码在执行者上运行。在这种情况下,有人能帮我做些什么吗 Sink.scala:Scala 如何使用Spark将数据流传输到Neo4j,scala,apache-spark,neo4j,spark-streaming,Scala,Apache Spark,Neo4j,Spark Streaming,我正在尝试使用Spark将流数据写入Neo4j,但遇到了一些问题(我是Spark的新手) 我已经尝试设置一个字数流,并可以使用自定义ForeachWriter将其写入Postgres,如示例所示。所以我认为我理解基本流程 然后,我尝试复制这一点,并使用Neo4j火花连接器将数据发送到Neo4j。我能够使用齐柏林飞艇笔记本中的示例向Neo4j发送数据。因此,我尝试将此代码传输到ForeachWriter,但我遇到了一个问题—sparkContext在ForeachWriter中不可用,根据我所读到
val spark = SparkSession
.builder()
.appName("Neo4jSparkConnector")
.config("spark.neo4j.bolt.url", "bolt://hdp1:7687")
.config("spark.neo4j.bolt.password", "pw")
.getOrCreate()
import spark.implicits._
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
val words = lines.as[String].flatMap(_.split(" "))
val wordCounts = words.groupBy("value").count()
wordCounts.printSchema()
val writer = new Neo4jSink()
import org.apache.spark.sql.streaming.ProcessingTime
val query = wordCounts
.writeStream
.foreach(writer)
.outputMode("append")
.trigger(ProcessingTime("25 seconds"))
.start()
query.awaitTermination()
class Neo4jSink() extends ForeachWriter[Row]{
def open(partitionId: Long, version: Long):Boolean = {
true
}
def process(value: Row): Unit = {
val word = ("Word", Seq("value"))
val word_count = ("WORD_COUNT", Seq.empty)
val count = ("Count", Seq("count"))
Neo4jDataFrame.mergeEdgeList(sparkContext, value, word, word_count, count)
}
def close(errorOrNull:Throwable):Unit = {
}
}
Neo4jSink.scala:
val spark = SparkSession
.builder()
.appName("Neo4jSparkConnector")
.config("spark.neo4j.bolt.url", "bolt://hdp1:7687")
.config("spark.neo4j.bolt.password", "pw")
.getOrCreate()
import spark.implicits._
val lines = spark.readStream
.format("socket")
.option("host", "localhost")
.option("port", 9999)
.load()
val words = lines.as[String].flatMap(_.split(" "))
val wordCounts = words.groupBy("value").count()
wordCounts.printSchema()
val writer = new Neo4jSink()
import org.apache.spark.sql.streaming.ProcessingTime
val query = wordCounts
.writeStream
.foreach(writer)
.outputMode("append")
.trigger(ProcessingTime("25 seconds"))
.start()
query.awaitTermination()
class Neo4jSink() extends ForeachWriter[Row]{
def open(partitionId: Long, version: Long):Boolean = {
true
}
def process(value: Row): Unit = {
val word = ("Word", Seq("value"))
val word_count = ("WORD_COUNT", Seq.empty)
val count = ("Count", Seq("count"))
Neo4jDataFrame.mergeEdgeList(sparkContext, value, word, word_count, count)
}
def close(errorOrNull:Throwable):Unit = {
}
}