Scala Docker Compose Spark Elasticsearch连接网络客户端错误
当在docker容器中运行spark submit时,我找不到对收到的错误消息的解决方案 因此,总体思路是通过卡夫卡生成数据,其结构如下:Scala Docker Compose Spark Elasticsearch连接网络客户端错误,scala,docker,apache-spark,Scala,Docker,Apache Spark,当在docker容器中运行spark submit时,我找不到对收到的错误消息的解决方案 因此,总体思路是通过卡夫卡生成数据,其结构如下: {'source': 'JFdyGil9YYHU', 'target': 'M4iCWTNB7P9E', 'amount': 5425.76, 'currency': 'EUR'} 然后通过Scala脚本在Spark中接收该数据,即: package com.example.spark import kafka.serializer.StringDeco
{'source': 'JFdyGil9YYHU', 'target': 'M4iCWTNB7P9E', 'amount': 5425.76, 'currency': 'EUR'}
然后通过Scala脚本在Spark中接收该数据,即:
package com.example.spark
import kafka.serializer.StringDecoder
import org.apache.spark.{TaskContext, SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.kafka.{OffsetRange, HasOffsetRanges, KafkaUtils}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.util.parsing.json.JSON
import org.elasticsearch.spark._
object Receiver {
def main(args: Array[String]): Unit = {
/** when starting the receiver, broker and topics must be passed.*/
if (args.length < 2) {
System.err.println(s"""
|Usage: DirectReceiver <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more kafka topics to consume from
|
""".stripMargin)
System.exit(1)
}
val Array(brokers, topics) = args
/** Create context:
* The --master option specifies the master URL for a distributed cluster,
* or local to run locally with one thread,
* or local[N] to run locally with N threads,
* or local[*] to run locally with as many worker threads as logical cores on your machine.
* You should start by using local for testing.
*/
val sparkConf = new SparkConf().setAppName("Receiver").setMaster("local[*]")
/** Whether elasticsearch-hadoop should create an index (if its missing)
* when writing data to Elasticsearch or fail.
* (default: yes, but specifying anyway for the sake of completeness)
*/
sparkConf.set("es.index.auto.create", "true")
/** Define that the context batch interval should take 2 seconds.*/
//val ssc = new StreamingContext(sparkConf, Seconds(2)) // testing alternatives
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, Seconds(2))
// Create direct kafka stream with brokers and topics
val topicsSet = topics.split(",").toSet // if there are many
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topicsSet)
/** Get the lines.
* messages are of format:
* (null, {"key": "value", "key": "value, ...})
* .map(_._2) takes the second tuple argument
*/
val lines = messages.map(_._2)
/** pairs are now: [Ljava.lang.String;@5922fbe4
* it is what "toString" function in scala actually returns:
* def toString(): String = this.getClass.getName + "@" + this.hashCode.toHexString
* [ means it’s an array
* L means it can contain references to objects
* java.lang.String means all those objects should be instances of java.lang.String
* ; is just because Java loves its semicolons
*
* Get rid of all the unneccessary charecters and split the string by comma for further usage.
*/
val pairs = lines.map(_.stripPrefix("{").stripSuffix("}").replaceAll("\"|\\s", "").split(","))
/** Getting key-value from the pairs, which are:
* key: value
* key: value
* key: value
* ...
*/
pairs.foreach(arr =>
arr.map(
x => Map( x(0).split(":")(0) -> x(0).split(":")(1) )
).saveToEs("spark/json-test")
)
/* testing
pairs.foreach(
arr => arr.foreach( x =>
//val source = Map(x.map(_.1) -> x.map(_.2))
//source.foreach(println)
x => x.foreach(println)
)
)*/
// Start the computation
ssc.start()
ssc.awaitTermination()
}
}
因此,我正在运行“sudo docker compose up-d”,我可以在浏览器中测试“localhost:9200”和“localhost:5601”,这两个功能都很好,但当我通过“sudo docker exec-it spark bash”运行容器并尝试通过以下方式提交receiver.jar时:
spark submit--主纱线客户端--驱动程序java选项“-Dlog4j.configuration=file:///app/receiver/log4j.properties“/app/receiver/building_jar/target/scala-2.10/receiver.jar kafka:9092测试
然后我得到这个错误消息:
18/12/28 09:05:18错误网络客户端:节点[127.0.0.1:9200]失败(连接被拒绝);没有其他节点剩余-正在中止
进程退出的其他消息。
因此,我理解连接以某种方式失败,但我不理解为什么:/
有人能帮忙吗?我不熟悉Spark,但在您的配置中,您正试图从一个容器连接到另一个容器,但这不起作用(这在docker之外工作,因为localhost是您的计算机,但当每个服务在其自己的容器中运行时,localhost指的是每个容器的localhost,而不是主机)
因此,在docker中运行时,请更改您的配置,以使用compose服务名称(在您的案例中是
elasticsearch
)而不是本地主机来引用elasticsearch,这一切都会正常工作-您需要在调用服务下的compose文件中添加elasticsearch作为链接,以便通过类似的服务名称引用它(就像你在spark下为kafka做的链接一样)。Hello@Markoorn,谢谢你的回复!我现在明白问题所在了。我尝试了配置docker compose文件的一些变体(包括你建议的链接),但我不知道如何“使用elasticsearch而不是localhost引用elasticsearch”.感觉有点傻-你能给我一个提示,怎么做吗?我终于用你的答案解决了这个错误。我不得不像你说的那样添加链接,并且不得不从我的.scala文件中删除“.setMaster(local[*])。现在我得到了另一个错误,但这个问题已经解决了:)非常感谢!似乎删除了“local[*]”-主设置导致spark不再处理任何内容(我花了一段时间才意识到这一点,因为我处理了语法错误),尽管我在spark submit命令中传递了--master Thread client。因此问题仍然是一样的,答案仍然正确。缺少的一件事(对我来说)就是我还是无法让它正常工作:(如果有人能提供帮助,我将不胜感激!您好@Nin4ikP-抱歉,我不知道Spark是如何设置的,因为我从未使用过它,有没有一种方法可以为不同的环境创建配置文件?基本上,您想要实现的是一种设置,在本地运行时使用localhost作为基址,然后在docker中运行时,您将我有一个不同的配置,它宁愿使用组合服务名称嘿:)谢谢你的帮助!我对这方面的一切都有点陌生。据我所知,我需要的一切都不是本地的。(我为Hello world运行了一个本地脚本,但当然我需要这些工具(Kafka、Spark、Elasticsearch、Kibana)为了交流,所以我没有在本地运行任何代码。但我认为问题是我正在尝试使用Thread客户端运行代码,因为我读到了一些关于ppl的信息,他们对这些设置有问题。所以我想我会在几分钟后问另一个关于这个问题的问题。也许有人也有这个问题,并且可以回答:)无论如何-谢谢!
version: '3.7'
services:
# kafka (zookeeper integrated)
kafka:
container_name: kafka
build: ./kafka
environment:
- KAFKA=localhost:9092
- ZOOKEEPER=localhost:2181
expose:
- 2181
- 9092
networks:
- kaspelki-net
# spark (contains all daemons)
spark:
container_name: spark
build: ./spark
command: bash
links:
- "kafka"
ports:
- 8080:8080
- 7077:7077
- 6066:6066
- 4040:4040
environment:
- SPARK_MASTER_HOST=spark://localhost:7077
env_file:
- ./hadoop/hadoop.env
tty: true
expose:
- 7077
- 8080
- 6066
- 4040
volumes:
- ./scripts/spark:/app
networks:
- kaspelki-net
# ELK
elasticsearch:
container_name: elasticsearch
build: ./ELK/elasticsearch
ports:
- 9200:9200
expose:
- 9200
networks:
- kaspelki-net
kibana:
container_name: kibana
build: ./ELK/kibana
ports:
- 5601:5601
expose:
- 5601
networks:
- kaspelki-net
depends_on:
- elasticsearch
### --- volumes --- ###
volumes:
data:
networks:
kaspelki-net:
name: kaspelki-net