Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cassandra/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从Cassandra表读取Apache Spark作业在启动时暂停(Spark-1.3.1)_Cassandra_Apache Spark - Fatal编程技术网

从Cassandra表读取Apache Spark作业在启动时暂停(Spark-1.3.1)

从Cassandra表读取Apache Spark作业在启动时暂停(Spark-1.3.1),cassandra,apache-spark,Cassandra,Apache Spark,Spark 1.3.1和datastax Cassandra连接器一直存在间歇性问题,导致作业在启动时无限期暂停 编辑:我还在Spark 1.2.1和打包的1.2.1 Spark-cassandra-connector_2.10中尝试了相同的方法,结果出现了相同的症状 我们正在使用以下依赖项: var sparkCas = "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.3.0-SNAPSHOT" 我们的工作代码: ob

Spark 1.3.1和datastax Cassandra连接器一直存在间歇性问题,导致作业在启动时无限期暂停

编辑:我还在Spark 1.2.1和打包的1.2.1 Spark-cassandra-connector_2.10中尝试了相同的方法,结果出现了相同的症状

我们正在使用以下依赖项:

var sparkCas = "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.3.0-SNAPSHOT"
我们的工作代码:

object ConnTransform {

  private val AppName = "ConnTransformCassandra"

  def main(args: Array[String]) {

    val start = new DateTime(2015, 5, 27, 1, 0, 0)

    val master = if (args.length >= 1) args(0) else "local[*]"

    // Create the spark context.
    val sc = {
      val conf = new SparkConf()
        .setAppName(AppName)
        .setMaster(master)
        .set("spark.cassandra.connection.host", "10.10.101.202,10.10.102.139,10.10.103.74")

      new SparkContext(conf)
    }

    sc.cassandraTable("alpha_dev", "conn")
      .select("data")
      .where("timep = ?", start)
      .where("sensorid IN ?", Utils.sensors)
      .map(Utils.deserializeRow)
      .saveAsTextFile("output/raw_data")
  }
}
正如您所看到的,代码非常简单(而且更复杂,但我们一直在试图缩小这个问题的根本原因)

现在,这项工作在今天早些时候起作用了——数据已成功地放入指定的目录。然而,现在当它运行时,我们会看到作业开始了,在它开始处理块之前到达点,然后无限期地坐在那里

下面作业的输出显示了到目前为止看到的日志消息,在编写作业时,该作业已暂停近一个小时。如果我们将日志记录级别设置为调试,那么在该点之后,您在作业中看到的唯一东西就是akka工作人员之间的心跳ping

ubuntu@ip-10-10-102-53:~/projects/icespark$ /home/ubuntu/spark/spark-1.3.1/bin/spark-submit --class com.splee.spark.ConnTransform splee-analytics-assembly-0.1.0.jar
15/05/27 21:15:21 INFO SparkContext: Running Spark version 1.3.1
15/05/27 21:15:21 INFO SecurityManager: Changing view acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: Changing modify acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
15/05/27 21:15:22 INFO Slf4jLogger: Slf4jLogger started
15/05/27 21:15:22 INFO Remoting: Starting remoting
15/05/27 21:15:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ip-10-10-102-53.us-west-2.compute.internal:51977]
15/05/27 21:15:22 INFO Utils: Successfully started service 'sparkDriver' on port 51977.
15/05/27 21:15:22 INFO SparkEnv: Registering MapOutputTracker
15/05/27 21:15:22 INFO SparkEnv: Registering BlockManagerMaster
15/05/27 21:15:22 INFO DiskBlockManager: Created local directory at /tmp/spark-2466ff66-bb50-4d52-9d34-1801d69889b9/blockmgr-60e75214-1ba6-410c-a564-361263636e5c
15/05/27 21:15:22 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/05/27 21:15:22 INFO HttpFileServer: HTTP File server directory is /tmp/spark-72f1e849-c298-49ee-936c-e94c462f3df2/httpd-f81c2326-e5f1-4f33-9557-074f2789c4ee
15/05/27 21:15:22 INFO HttpServer: Starting HTTP Server
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SocketConnector@0.0.0.0:55357
15/05/27 21:15:22 INFO Utils: Successfully started service 'HTTP file server' on port 55357.
15/05/27 21:15:22 INFO SparkEnv: Registering OutputCommitCoordinator
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/05/27 21:15:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/05/27 21:15:22 INFO SparkUI: Started SparkUI at http://ip-10-10-102-53.us-west-2.compute.internal:4040
15/05/27 21:15:22 INFO SparkContext: Added JAR file:/home/ubuntu/projects/icespark/splee-analytics-assembly-0.1.0.jar at http://10.10.102.53:55357/jars/splee-analytics-assembly-0.1.0.jar with timestamp 1432761322942
15/05/27 21:15:23 INFO Executor: Starting executor ID <driver> on host localhost
15/05/27 21:15:23 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@ip-10-10-102-53.us-west-2.compute.internal:51977/user/HeartbeatReceiver
15/05/27 21:15:23 INFO NettyBlockTransferService: Server created on 58479
15/05/27 21:15:23 INFO BlockManagerMaster: Trying to register BlockManager
15/05/27 21:15:23 INFO BlockManagerMasterActor: Registering block manager localhost:58479 with 265.1 MB RAM, BlockManagerId(<driver>, localhost, 58479)
15/05/27 21:15:23 INFO BlockManagerMaster: Registered BlockManager
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.28:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.28 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.60:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.60 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.154:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.154 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.145:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.145 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.78:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.78 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.200:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.200 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.73:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.73 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.74:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.202:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.139:9042 added
15/05/27 21:15:24 INFO CassandraConnector: Connected to Cassandra cluster: Splee Dev
15/05/27 21:15:25 INFO CassandraConnector: Disconnected from Cassandra cluster: Splee Dev
ubuntu@ip-10-10-102-53:~/projects/icespark$/home/ubuntu/spark/spark-1.3.1/bin/spark-submit--class com.splee.spark.contransform splee-analytics-assembly-0.1.0.jar
15/05/27 21:15:21信息SparkContext:运行Spark版本1.3.1
15/05/27 21:15:21信息安全管理器:将视图ACL更改为:ubuntu
15/05/27 21:15:21信息安全管理器:将修改ACL更改为:ubuntu
15/05/27 21:15:21信息安全管理器:安全管理器:身份验证已禁用;ui ACL被禁用;具有查看权限的用户:Set(ubuntu);具有修改权限的用户:Set(ubuntu)
15/05/27 21:15:22信息Slf4jLogger:Slf4jLogger已启动
15/05/27 21:15:22信息远程处理:开始远程处理
15/05/27 21:15:22信息远程处理:远程处理已开始;收听地址:[阿克卡。tcp://sparkDriver@ip-10-10-102-53.美国西部2.计算机内部:51977]
15/05/27 21:15:22信息提示:已成功启动51977端口上的“sparkDriver”服务。
15/05/27 21:15:22信息SparkEnv:正在注册MapOutputRacker
15/05/27 21:15:22信息SparkEnv:注册BlockManagerMaster
15/05/27 21:15:22信息DiskBlockManager:已在/tmp/spark-2466ff66-bb50-4d52-9d34-1801d69889b9/blockmgr-60e75214-1ba6-410c-a564-3612636e5c创建本地目录
15/05/27 21:15:22信息MemoryStore:MemoryStore以265.1 MB的容量启动
15/05/27 21:15:22信息HttpFileServer:HTTP文件服务器目录为/tmp/spark-72f1e849-c298-49ee-936c-e94c462f3df2/httpd-f81c2326-e5f1-4f33-9557-074f2789c4ee
15/05/27 21:15:22信息HttpServer:正在启动HTTP服务器
15/05/27 21:15:22信息服务器:jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22信息摘要连接器:已启动SocketConnector@0.0.0.0:55357
15/05/27 21:15:22信息实用程序:已在端口55357上成功启动服务“HTTP文件服务器”。
15/05/27 21:15:22信息SparkEnv:正在注册OutputCommitCoordinator
15/05/27 21:15:22信息服务器:jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22信息摘要连接器:已启动SelectChannelConnector@0.0.0.0:4040
15/05/27 21:15:22信息实用程序:已在端口4040上成功启动服务“SparkUI”。
15/05/27 21:15:22信息斯巴库伊:斯巴库伊于http://ip-10-10-102-53.us-west-2.compute.internal:4040
15/05/27 21:15:22信息SparkContext:添加JAR文件:/home/ubuntu/projects/icespark/splee-analytics-assembly-0.1.0.JAR位于http://10.10.102.53:55357/jars/splee-analytics-assembly-0.1.0.jar,时间戳1432761322942
15/05/27 21:15:23信息执行器:正在主机localhost上启动执行器ID
15/05/27 21:15:23信息AkkaUtils:连接到HeartbeatReceiver:akka。tcp://sparkDriver@ip-10-10-102-53.us-west-2.计算内部:51977/用户/心跳接收器
15/05/27 21:15:23信息NettyBlockTransferService:在58479上创建的服务器
15/05/27 21:15:23信息BlockManager管理员:正在尝试注册BlockManager
15/05/27 21:15:23信息BlockManagerMasterActor:使用265.1 MB RAM注册块管理器localhost:58479,BlockManagerId(,localhost,58479)
15/05/27 21:15:23信息BlockManager管理员:注册BlockManager
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.101.28:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.101.28(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.103.60:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.103.60(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.102.154:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.102.154(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.101.145:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.101.145(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.103.78:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.103.78(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.102.200:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.102.200(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.102.73:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.102.73(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.103.205:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.103.205(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.101.205:9042
15/05/27 21:15:24信息LocalNodeFirstLoadBalancing策略:添加主机10.10.101.205(us-west-2)
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.103.74:9042
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.101.202:9042
15/05/27 21:15:24信息集群:新增Cassandra主机/10.10.102.139:9042
15/05/27 21:15:24信息CassandraConnector:已连接到Cassandra群集:Splee Dev
15/05/27 21:15:25信息CassandraConnector:已断开与Cassandra群集的连接:Splee Dev
如果有人