elasticsearch,pyspark,pyspark-sql,Apache Spark,elasticsearch,Pyspark,Pyspark Sql" /> elasticsearch,pyspark,pyspark-sql,Apache Spark,elasticsearch,Pyspark,Pyspark Sql" />

Apache spark 使用Pyspark读取elasticsearch索引

Apache spark 使用Pyspark读取elasticsearch索引,apache-spark,elasticsearch,pyspark,pyspark-sql,Apache Spark,elasticsearch,Pyspark,Pyspark Sql,我试图使用Pyspark(v1.6.3)读取elasticsearch索引,但出现以下错误 我正在使用以下代码段读取/加载索引 es_reader = sql_context.read.format("org.elasticsearch.spark.sql") .option("es.nodes", "x.x.x.x,y.y.y.y,z.z.z.z") .option("es.port", "9200").option("e

我试图使用Pyspark(v1.6.3)读取elasticsearch索引,但出现以下错误

我正在使用以下代码段读取/加载索引

    es_reader = sql_context.read.format("org.elasticsearch.spark.sql")
                .option("es.nodes", "x.x.x.x,y.y.y.y,z.z.z.z")
                .option("es.port", "9200").option("es.net.ssl","true")
                .option("es.net.http.auth.user", "*****")
                .option("es.net.http.auth.pass", "*****")
    sde_df = es_reader.load("index_name/doc_type")
错误:

22_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
  File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o72.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/StreamSinkProvider

Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.StreamSinkProvider
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 49 more
环境:

22_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
  File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o72.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/StreamSinkProvider

Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.StreamSinkProvider
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 49 more
Spark v1.6.3


Elasticsearch 5.4.3

由于您的程序抱怨org.apache.spark.sql.sources.StreamSinkProvider类丢失,您使用的ES驱动程序与spark版本不兼容,或者ES jar不存在,因此您可以得到它,由于您的程序正在抱怨org.apache.spark.sql.sources.StreamSinkProvider类丢失,您正在使用的ES驱动程序与spark版本不兼容或ES jar不存在,因此兼容性有问题。谢谢mate