Apache spark 使用stratio和spark从Aerospike读取数据时无法计算RDD
我已经编写了一个程序,其目的是从Aerospike读取数据,并将其转换为spark中的RDDApache spark 使用stratio和spark从Aerospike读取数据时无法计算RDD,apache-spark,aerospike,stratio,Apache Spark,Aerospike,Stratio,我已经编写了一个程序,其目的是从Aerospike读取数据,并将其转换为spark中的RDD public void sparkTest () throws UnsupportedDataTypeException{ log.debug("TESTING SPARK WITH AEROSPIKE"); String host = "localhost"; int port = 3000; String namespace = "
public void sparkTest () throws UnsupportedDataTypeException{
log.debug("TESTING SPARK WITH AEROSPIKE");
String host = "localhost";
int port = 3000;
String namespace = "mynamespace";
String inputSet = "myset";
AerospikeDeepJobConfig inputConfigCell = AerospikeConfigFactory.createAerospike().host(host).port(3000)
.namespace(namespace)
.set(inputSet)
;
log.debug("Print inputConfigCell ......");
log.debug(inputConfigCell.getNamespace());
log.debug(inputConfigCell.getSet());
log.debug(inputConfigCell.getAerospikePort());
log.debug(inputConfigCell.getHost());
JavaRDD inputRDDCell = sparkContext.createJavaRDD(inputConfigCell);
log.debug("Print RDD .............");
log.debug(inputRDDCell);
}
我知道我的Aerospike集合中有许多记录,但无法访问“inpurddcell
”的RDD性质。即使是命名空间、集合、端口主机的日志也是完全正确的。我试图使用inpurddcell.first()
但它会给出异常,但当我简单地打印RDD对象时,它会给出非常精细的输出
请指导我如何正确地从中生成可用且功能强大的RDD。我使用此链接作为指导:
我使用了RDD和JAVARDD,但得到了相同的输出
日志的输出为:
[2016-03-10 15:58:05.812] boot - 13535 DEBUG [main] --- PushAnalysisService: TESTING SPARK WITH AEROSPIKE
[2016-03-10 15:58:05.825] boot - 13535 DEBUG [main] --- PushAnalysisService: Print inputConfigCell ......
[2016-03-10 15:58:05.827] boot - 13535 DEBUG [main] --- PushAnalysisService: mynamespace
[2016-03-10 15:58:05.829] boot - 13535 DEBUG [main] --- PushAnalysisService: myset
[2016-03-10 15:58:05.831] boot - 13535 DEBUG [main] --- PushAnalysisService: 3000
[2016-03-10 15:58:05.832] boot - 13535 DEBUG [main] --- PushAnalysisService: localhost
[2016-03-10 15:58:06.025] boot - 13535 INFO [main] --- MemoryStore: ensureFreeSpace(552) called with curMem=0, maxMem=539724349
[2016-03-10 15:58:06.035] boot - 13535 INFO [main] --- MemoryStore: Block broadcast_0 stored as values in memory (estimated size 552.0 B, free 514.7 MB)
[2016-03-10 15:58:06.161] boot - 13535 INFO [main] --- MemoryStore: ensureFreeSpace(901) called with curMem=552, maxMem=539724349
[2016-03-10 15:58:06.165] boot - 13535 INFO [main] --- MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 901.0 B, free 514.7 MB)
[2016-03-10 15:58:06.196] boot - 13535 INFO [sparkDriver-akka.actor.default-dispatcher-5] --- BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:49368 (size: 901.0 B, free: 514.7 MB)
[2016-03-10 15:58:06.205] boot - 13535 INFO [main] --- SparkContext: Created broadcast 0 from broadcast at DeepRDD.java:65
[2016-03-10 15:58:06.294] boot - 13535 DEBUG [main] --- PushAnalysisService: Print RDD .............
[2016-03-10 15:58:06.302] boot - 13535 DEBUG [main] --- PushAnalysisService: DeepRDD[0] at RDD at DeepRDD.java:62
社区项目和Aerospike支持的fork项目之间存在显著差异 CommunityOne处于休眠状态,只提供基本的RDD支持。Aerospike支持的支持RDD、数据帧和SparkSQL。您应该尝试使用现有的代码。它给出了一个异常-查看一些回溯将非常有用。