Scala 利用Spark中的复滤波从elasticsearch中提取esJsonRDD
我目前正在获取我们的Scala 利用Spark中的复滤波从elasticsearch中提取esJsonRDD,scala,apache-spark,elasticsearch,spark-dataframe,rdd,Scala,Apache Spark,elasticsearch,Spark Dataframe,Rdd,我目前正在获取我们的Spark作业中的elasticsearchRDD,基于一行弹性查询进行过滤(示例): 现在,如果我们的搜索查询变得复杂,如: { "query": { "filtered": { "query": { "query_string": { "default_operator": "AND", "query": "dir
Spark作业中的elasticsearch
RDD,基于一行弹性查询进行过滤(示例):
现在,如果我们的搜索查询变得复杂,如:
{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}
我仍然可以将该查询转换为内嵌弹性查询,以便将其与esJsonRDD一起使用吗?或者,上述查询是否仍然可以与esJsonRDD一起使用?
如果没有,在Spark中获取此类RDD的更好方法是什么
因为esJsonRDD似乎只接受内联(一行)弹性查询。使用三重引号:
val query = """{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}"""
val elasticRdds = sparkContext.esJsonRDD(esIndex, query)
使用三重引号:
val query = """{
"query": {
"filtered": {
"query": {
"query_string": {
"default_operator": "AND",
"query": "director.name:DAVID + \n movie.name:SEVEN"
}
},
"filter": {
"nested": {
"path": "movieStatus.boxoffice.status",
"query": {
"bool": {
"must": [
{
"match": {
"movieStatus.boxoffice.status.rating": "A"
}
},
{
"match": {
"movieStatus.boxoffice.status.oscar": "false"
}
}
]
}
}
}
}
}
}
}"""
val elasticRdds = sparkContext.esJsonRDD(esIndex, query)