Scala 带HttpClient的elastic4s akka流接收器
我正在尝试使用akka streams和elastic4s将数据从文件流式传输到elastic搜索 我有一个Scala 带HttpClient的elastic4s akka流接收器,scala,stream,akka,elastic4s,Scala,Stream,Akka,Elastic4s,我正在尝试使用akka streams和elastic4s将数据从文件流式传输到elastic搜索 我有一个Movie对象,可以插入到弹性搜索中,并且可以使用httpclient为这种类型的对象编制索引: val httpClient = HttpClient(ElasticsearchClientUri("localhost", 9200)) val response = Await.result(httpClient.execute { indexInto("movies" / "mov
Movie
对象,可以插入到弹性搜索中,并且可以使用httpclient为这种类型的对象编制索引:
val httpClient = HttpClient(ElasticsearchClientUri("localhost", 9200))
val response = Await.result(httpClient.execute {
indexInto("movies" / "movie").source(movie)
}, 10 seconds)
println(s"result: $response")
httpClient.close()
现在我尝试使用akka流来索引电影
对象
我有一个创建水槽的函数:
def toElasticSearch(client: HttpClient)(implicit actorSystem: ActorSystem): Sink[Movie, NotUsed] = {
var count = 0
implicit val movieImporter = new RequestBuilder[Movie] {
import com.sksamuel.elastic4s.http.ElasticDsl._
def request(movie: Movie): BulkCompatibleDefinition = {
count = count + 1
println(s"inserting ${movie.id} -> ${movie.title} - $count")
index("movies", "movie").source[Movie](movie)
}
}
val subscriber = client.subscriber[Movie](
batchSize=10
, concurrentRequests = 2
, completionFn = () => {println(s"completion: all done")}
, errorFn = (t: Throwable) => println(s"error: $t")
)
Sink.fromSubscriber(subscriber)
}
还有一个测试:
describe("a DataSinkService elasticsearch sink") {
it ("should write data to elasticsearch using an http client") {
var count = 0
val httpClient = HttpClient(ElasticsearchClientUri("localhost", 9200))
val graph = GraphDSL.create(sinkService.toElasticSearch(httpClient)) { implicit builder: GraphDSL.Builder[NotUsed] => s =>
val flow: Flow[JsValue, Movie, NotUsed] = Flow[JsValue].map[Movie](j => {
val m = Movie.fromMovieDbJson(j)
count = count + 1
println(s"parsed id:${m.id} - $count")
m
})
sourceService.fromFile(3, 50) ~> flow ~> s
ClosedShape
}
RunnableGraph.fromGraph(graph).run
Thread.sleep(20.seconds.toMillis)
println(s"\n*******************\ndone waiting...\n")
httpClient.close()
println(s"closed")
}
}
我发送了47个元素sourceService.fromFile(3,50)
输出显示:
完成等待
关闭
completion:all done
(completionFn
)batchSize
和concurrentRequests
的参数分别更改为12和3,我将看到36个元素被解析和索引
因此,在batchSize*concurrentRequests
之后,接收器似乎停止接受元素
我的问题是:
httpClient
我的第一个技巧是停止使用
Thread.sleep()
,而是直接使用scala.concurrent.wait
或测试框架提供的任何东西来处理未来。我将代码更改为不再使用Thread.sleep()
,但接收器仍然退出(抛出java.net.ConnectException:连接被拒绝
)在batchSize*concurrentRequests之后,与elasticsearch的连接失败。端口不应该是9300吗?实际上与elasticsearch的连接很好。因为我使用的是http客户端,所以端口应该是9200。我能够编写batchSize*concurrentRequests
,但在多次写入elasticsearch后,流停止,即使源代码在周末可以释放更多元素,我放弃了使用elastic4s提供的流实现,编写了一个基于actor的接收器,它可以缓冲一些批量大小的元素,然后使用elastic4s提供的http客户端进行批量调用。我仍然需要加强错误处理,但它似乎会填充缓冲区,对它们进行索引,然后从流中接收更多元素。它能够重复这个过程,直到源没有更多的元素发射。示例代码可在