Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/github/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 无法从Spark连接到cassandra_Apache Spark_Cassandra_Pyspark_Datastax_Spark Cassandra Connector - Fatal编程技术网

Apache spark 无法从Spark连接到cassandra

Apache spark 无法从Spark连接到cassandra,apache-spark,cassandra,pyspark,datastax,spark-cassandra-connector,Apache Spark,Cassandra,Pyspark,Datastax,Spark Cassandra Connector,我有一些测试数据在我的卡桑德拉。我试图从spark获取此数据,但出现如下错误: py4j.protocol.Py4JJavaError: An error occurred while calling o25.load. java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042 这就是我到目前为止所做的: 已启动/bin/cassandra 使用cql和keyspace=“te

我有一些测试数据在我的卡桑德拉。我试图从spark获取此数据,但出现如下错误:

py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.

java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
这就是我到目前为止所做的:

  • 已启动
    /bin/cassandra
  • 使用
    cql
    keyspace=“testkeyspace2”
    table=“emp”
    以及一些键和相应的值创建测试数据
  • 编写了独立的.py
  • 运行以下
    pyspark
    shell命令

    sudo ./bin/spark-submit --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar \
    --packages TargetHolding:pyspark-cassandra:0.2.4 \
    examples/src/main/python/standalone.py
    
  • 得到了提到的错误


  • standalone.py:

    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SQLContext
    
    conf = SparkConf().setAppName("Stand Alone Python Script")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    loading=sqlContext.read.format("org.apache.spark.sql.cassandra")\
                            .options(table="emp", keyspace = "testkeyspace2")\
                            .load()\
                            .show()
    
    Traceback (most recent call last):
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/standalone.py", line 8, in <module>
        .options(table="emp", keyspace = "testkeyspace2")\
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
    : java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
        at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:203)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.TransportException: [/127.0.1.1:9042] Cannot connect))
        at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
        at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
        at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
        at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:339)
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157)
        ... 22 more
    
    我也试过使用
    ——packages datastax:spark cassandra connector:1.5.0-RC1-s_2.11
    ,但我得到了相同的错误


    调试:

    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SQLContext
    
    conf = SparkConf().setAppName("Stand Alone Python Script")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    loading=sqlContext.read.format("org.apache.spark.sql.cassandra")\
                            .options(table="emp", keyspace = "testkeyspace2")\
                            .load()\
                            .show()
    
    Traceback (most recent call last):
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/standalone.py", line 8, in <module>
        .options(table="emp", keyspace = "testkeyspace2")\
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
    : java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
        at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:203)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.TransportException: [/127.0.1.1:9042] Cannot connect))
        at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
        at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
        at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
        at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:339)
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157)
        ... 22 more
    
    我查过了

    netstat -tulpn | grep -i listen | grep <cassandra_pid>
    
    netstat-tulpn | grep-i listen | grep
    
    看到它正在监听端口9042


    完整日志跟踪:

    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SQLContext
    
    conf = SparkConf().setAppName("Stand Alone Python Script")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    loading=sqlContext.read.format("org.apache.spark.sql.cassandra")\
                            .options(table="emp", keyspace = "testkeyspace2")\
                            .load()\
                            .show()
    
    Traceback (most recent call last):
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/standalone.py", line 8, in <module>
        .options(table="emp", keyspace = "testkeyspace2")\
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
    : java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
        at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:203)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.TransportException: [/127.0.1.1:9042] Cannot connect))
        at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
        at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
        at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
        at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:339)
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157)
        ... 22 more
    
    回溯(最近一次呼叫最后一次):
    文件“~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/standalone.py”,第8行,在
    .options(table=“emp”,keyspace=“testkeyspace2”)\
    文件“~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/readwriter.py”,第139行,已加载
    文件“~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py”,第813行,在调用中__
    文件“~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py”,第45行,deco格式
    文件“~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py”,第308行,在get\u返回值中
    py4j.protocol.Py4JJavaError:调用o25.load时出错。
    :java.io.IOException:无法在{127.0.1.1}:9042处打开到Cassandra的本机连接
    在com.datasax.spark.connector.cql.CassandraConnector$.com$datasax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
    在com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply上(CassandraConnector.scala:150)
    在com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply上(CassandraConnector.scala:150)
    在com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)上
    位于com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
    在com.datasax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
    在com.datasax.spark.connector.cql.CassandraConnector.withSessionDo上(CassandraConnector.scala:109)
    在com.datastax.spark.connector.rdd.partitioner.cassandrardpartitioner$.getTokenFactory(cassandrardpartitioner.scala:176)
    位于org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:203)
    位于org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
    位于org.apache.spark.sql.execution.datasources.resolvedatasource$.apply(resolvedatasource.scala:158)
    位于org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
    在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
    位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
    位于java.lang.reflect.Method.invoke(Method.java:497)
    位于py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    位于py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    在py4j.Gateway.invoke处(Gateway.java:259)
    位于py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    在py4j.commands.CallCommand.execute(CallCommand.java:79)
    在py4j.GatewayConnection.run处(GatewayConnection.java:209)
    运行(Thread.java:745)
    原因:com.datasax.driver.core.exceptions.NoHostAvailableException:所有尝试查询的主机均失败(尝试:/127.0.1.1:9042(com.datasax.driver.core.TransportException:[/127.0.1.1:9042]无法连接))
    位于com.datastax.driver.core.ControlConnection.ReconnectionInternal(ControlConnection.java:227)
    位于com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
    位于com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
    位于com.datasax.driver.core.Cluster.getMetadata(Cluster.java:339)
    在com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157)
    ... 还有22个
    
    我做错什么了吗


    我对这一切都很陌生,所以我需要一些建议。谢谢

    根据我们在问题注释中的对话,问题是“localhost”用于cassandra.yaml文件中的
    rpc_address
    。卡桑德拉使用操作系统将“localhost”解析为127.0.0.1,并在该接口上进行了显式监听

    要解决此问题,您需要将cassandra.yaml中的
    rpc_address
    更新为127.0.1.1,并重新启动cassandra,或者将您的SparkConf更新为参考127.0.0.1,即:

    conf=SparkConf().setAppName(“独立Python脚本”)
    .set(“spark.cassandra.connection.host”,“127.0.0.1”)
    

    虽然有一件事对我来说似乎很奇怪,spark.cassandra.connection.host也默认为“localhost”,但spark cassandra连接器将“localhost”解析为“127.0.1.1”,而cassandra将其解析为“127.0.0.1”,这让我感到奇怪。

    我在
    /etc/hosts
    中检查了我的linux主机文件,内容如下

    127.0.0.1       localhost
    127.0.1.1       <my hostname>
    
    127.0.0.1本地主机
    127.0.1.1       
    
    我把它改成:

    127.0.0.1       localhost
    127.0.0.1       <my hostname>
    
    127.0.0.1本地主机
    127.0.0.1       
    
    而且效果很好

    正如您在自己的日志文件
    行号58
    中看到的,它提到了
    您的主机名,ganguly解析为环回地址:127.0.1.1;改为使用192.168.1.32(在接口wlan0上)
    ,我想这也适用于您的案例。

    将此添加到您的--packages依赖项旁边,它对