Apache spark 使用Pyspark读取elasticsearch索引
我试图使用Pyspark(v1.6.3)读取elasticsearch索引,但出现以下错误 我正在使用以下代码段读取/加载索引Apache spark 使用Pyspark读取elasticsearch索引,apache-spark,elasticsearch,pyspark,pyspark-sql,Apache Spark,elasticsearch,Pyspark,Pyspark Sql,我试图使用Pyspark(v1.6.3)读取elasticsearch索引,但出现以下错误 我正在使用以下代码段读取/加载索引 es_reader = sql_context.read.format("org.elasticsearch.spark.sql") .option("es.nodes", "x.x.x.x,y.y.y.y,z.z.z.z") .option("es.port", "9200").option("e
es_reader = sql_context.read.format("org.elasticsearch.spark.sql")
.option("es.nodes", "x.x.x.x,y.y.y.y,z.z.z.z")
.option("es.port", "9200").option("es.net.ssl","true")
.option("es.net.http.auth.user", "*****")
.option("es.net.http.auth.pass", "*****")
sde_df = es_reader.load("index_name/doc_type")
错误:
22_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o72.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/StreamSinkProvider
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.StreamSinkProvider
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 49 more
环境:
22_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "/mnt/yarn/usercache/root/appcache/application_1547624497922_0452/container_1547624497922_0452_02_000001/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o72.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/StreamSinkProvider
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.StreamSinkProvider
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 49 more
Spark v1.6.3
Elasticsearch 5.4.3由于您的程序抱怨org.apache.spark.sql.sources.StreamSinkProvider类丢失,您使用的ES驱动程序与spark版本不兼容,或者ES jar不存在,因此您可以得到它,由于您的程序正在抱怨org.apache.spark.sql.sources.StreamSinkProvider类丢失,您正在使用的ES驱动程序与spark版本不兼容或ES jar不存在,因此兼容性有问题。谢谢mate