Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark PySpark Cassandra数据库连接问题_Apache Spark_Pyspark_Cassandra_Spark Cassandra Connector - Fatal编程技术网

Apache spark PySpark Cassandra数据库连接问题

Apache spark PySpark Cassandra数据库连接问题,apache-spark,pyspark,cassandra,spark-cassandra-connector,Apache Spark,Pyspark,Cassandra,Spark Cassandra Connector,我想用卡桑德拉和派斯帕克。我可以正确地远程连接到Spark服务器。但在阅读卡桑德拉表格的阶段,我遇到了麻烦。我尝试了所有的datastax连接器,我更改了Spark配置(核心、内存等),但我无法完成。(下面代码中的注释行是我的尝试。) 这是我的python代码 import os os.environ['JAVA_HOME']="C:\Program Files\Java\jdk1.8.0_271" os.environ['HADOOP_HOME']="E:\etc

我想用卡桑德拉和派斯帕克。我可以正确地远程连接到Spark服务器。但在阅读卡桑德拉表格的阶段,我遇到了麻烦。我尝试了所有的datastax连接器,我更改了Spark配置(核心、内存等),但我无法完成。(下面代码中的注释行是我的尝试。)

这是我的python代码

import os
os.environ['JAVA_HOME']="C:\Program Files\Java\jdk1.8.0_271"
os.environ['HADOOP_HOME']="E:\etc\spark-3.0.1-bin-hadoop2.7"
os.environ['PYSPARK_DRIVER_PYTHON']="/usr/local/bin/python3.7"
os.environ['PYSPARK_PYTHON']="/usr/local/bin/python3.7"

# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.0 --conf spark.cassandra.connection.host=XX.XX.XX.XX spark.cassandra.auth.username=username spark.cassandra.auth.password=passwd pyspark-shell'
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars .ivy2\jars\spark-cassandra-connector-driver_2.12-3.0.0-alpha2.jar pyspark-shell'
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.0-alpha2 pyspark-shell'

from pyspark.conf import SparkConf
from pyspark.context import SparkContext
from pyspark.sql import Row
from pyspark.sql import SQLContext
conf = SparkConf()
conf.setMaster("spark://YY.YY.YY:7077").setAppName("My app")
conf.set("spark.shuffle.service.enabled", "false")
conf.set("spark.dynamicAllocation.enabled","false")
conf.set("spark.executor.cores", "2")
conf.set("spark.executor.memory", "5g")
conf.set("spark.executor.instances", "1")
conf.set("spark.jars", "C:\\Users\\verianalizi\\.ivy2\\jars\\spark-cassandra-connector_2.12-3.0.0-beta.jar")

conf.set("spark.cassandra.connection.host","XX.XX.XX.XX")
conf.set("spark.cassandra.auth.username","username")
conf.set("spark.cassandra.auth.password","passwd")
conf.set("spark.cassandra.connection.port", "9042")
# conf.set("spark.sql.catalog.myCatalog", "com.datastax.spark.connector.datasource.CassandraCatalog")

sc = SparkContext(conf=conf)
# sc.setLogLevel("ERROR")
sqlContext = SQLContext(sc)
list_p = [('John',19),('Smith',29),('Adam',35),('Henry',50)]
rdd = sc.parallelize(list_p)
ppl = rdd.map(lambda x: Row(name=x[0], age=int(x[1])))
DF_ppl = sqlContext.createDataFrame(ppl)

# It works well until now

def load_and_get_table_df(keys_space_name, table_name):
    table_df = sqlContext.read\
        .format("org.apache.spark.sql.cassandra")\
        .option("keyspace",keys_space_name)\
        .option("table",table_name)\
        .load()
    return table_df

movies = load_and_get_table_df("weather", "currentweatherconditions")
我得到的错误是;


有人知道吗?

发生这种情况是因为您只指定了
spark.jars
属性,并指向单个jar。但spark cassandra连接器取决于该列表中未包含的其他罐子的数量。我建议使用带有坐标
com.datasax.spark:spark-cassandra-connector_2.12:3.0.0
spark.jars.packages
或在
spark.jars
中指定具有所有必要依赖项的路径


顺便说一句,3.0是几个月前发布的-为什么你还在使用beta版?

谢谢你回答我的问题。我尝试了下面的代码和类似代码
os.environ['PYSPARK\u SUBMIT\u ARGS']='--packages com.datastax.spark:spark-cassandra-connector\u 2.12:3.0.0 PYSPARK shell'
,但结果是一样的。我不能说JupyterI找到了解决方案。当然可以使用上面的信息
conf.set(“spark.executor.jars”、“C:\\Users\\verianalizi\\.ivy2\\jars\\spark-cassandra-connector-assembly_2.12-3.0.0.jar”)conf.set(“spark.driver.extraClassPath”、“C:\\Users\\verianalizi\.ivy2\\jars\\spark-cassandra-connector-assembly_2.12-3.0.0.jar”)