Accumulo扫描/写入未在AWS EC2主机中使用Cloudera CDH 5.8.2的独立Java主程序中运行
我们试图在独立Java主程序Maven shade可执行文件中从Accumulo客户端jar 1.5.0运行simple write/sacn,如下所述,在AWS EC2 master中使用PuttyAccumulo扫描/写入未在AWS EC2主机中使用Cloudera CDH 5.8.2的独立Java主程序中运行,java,hdfs,cloudera,cloudera-cdh,accumulo,Java,Hdfs,Cloudera,Cloudera Cdh,Accumulo,我们试图在独立Java主程序Maven shade可执行文件中从Accumulo客户端jar 1.5.0运行simple write/sacn,如下所述,在AWS EC2 master中使用Putty public class AccumuloQueryApp { private static final Logger logger = LoggerFactory.getLogger(AccumuloQueryApp.class); public static
public class AccumuloQueryApp {
private static final Logger logger = LoggerFactory.getLogger(AccumuloQueryApp.class);
public static final String INSTANCE = "accumulo"; // miniInstance
public static final String ZOOKEEPERS = "ip-x-x-x-100:2181"; //localhost:28076
private static Connector conn;
static {
// Accumulo
Instance instance = new ZooKeeperInstance(INSTANCE, ZOOKEEPERS);
try {
conn = instance.getConnector("root", new PasswordToken("xxx"));
} catch (Exception e) {
logger.error("Connection", e);
}
}
public static void main(String[] args) throws TableNotFoundException, AccumuloException, AccumuloSecurityException, TableExistsException {
System.out.println("connection with : " + conn.whoami());
BatchWriter writer = conn.createBatchWriter("test", ofBatchWriter());
for (int i = 0; i < 10; i++) {
Mutation m1 = new Mutation(String.valueOf(i));
m1.put("personal_info", "first_name", String.valueOf(i));
m1.put("personal_info", "last_name", String.valueOf(i));
m1.put("personal_info", "phone", "983065281" + i % 2);
m1.put("personal_info", "email", String.valueOf(i));
m1.put("personal_info", "date_of_birth", String.valueOf(i));
m1.put("department_info", "id", String.valueOf(i));
m1.put("department_info", "short_name", String.valueOf(i));
m1.put("department_info", "full_name", String.valueOf(i));
m1.put("organization_info", "id", String.valueOf(i));
m1.put("organization_info", "short_name", String.valueOf(i));
m1.put("organization_info", "full_name", String.valueOf(i));
writer.addMutation(m1);
}
writer.close();
System.out.println("Writing complete ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
Scanner scanner = conn.createScanner("test", new Authorizations());
System.out.println("Step 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
scanner.setRange(new Range("3", "7"));
System.out.println("Step 2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
scanner.forEach(e -> System.out.println("Key: " + e.getKey() + ", Value: " + e.getValue()));
System.out.println("Step 3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`");
scanner.close();
}
public static BatchWriterConfig ofBatchWriter() {
//Batch Writer Properties
final int MAX_LATENCY = 1;
final int MAX_MEMORY = 10000000;
final int MAX_WRITE_THREADS = 10;
final int TIMEOUT = 10;
BatchWriterConfig config = new BatchWriterConfig();
config.setMaxLatency(MAX_LATENCY, TimeUnit.MINUTES);
config.setMaxMemory(MAX_MEMORY);
config.setMaxWriteThreads(MAX_WRITE_THREADS);
config.setTimeout(TIMEOUT, TimeUnit.MINUTES);
return config;
}
}
当我们运行相同的代码写入Accumulo并从Accumulo内部读取Spark作业并提交给YANK集群时,它运行得非常完美。我们正在努力弄清楚这一点,但没有得到任何线索。请参阅下面描述的环境
AWS环境上的Cloudera CDH 5.8.2将4个EC2实例作为一个主实例和3个子实例
考虑一下私有IP就像
材料:x.x.x.100
孩子1:x.x.x.101
孩子2:x.x.x.102
孩子3:x.x.x.103
我们在CDH有以下安装
集群CDH 5.8.2
未安装Accumulo 1.6跟踪器、Child2中的垃圾收集器、Master中的Master、child3中的Monitor、Master中的Tablet Server
糖化血红蛋白
HDFS主节点作为名称节点,所有3个子节点作为数据节点
卡夫卡
火花
包括纱线MR2
动物园管理员
Hrm,这很奇怪,它是以Spark on Thread运行的,但作为一个常规Java应用程序。通常情况下,情况正好相反: 我将验证独立java应用程序的类路径上的JAR是否与Spark on Thread作业以及Accumulo服务器类路径使用的JAR匹配
如果这没有帮助,试着增加log4j级别以进行调试或跟踪,看看是否有什么事情发生在您身上。如果您很难理解日志所说的内容,请随时向发送电子邮件user@accumulo.apache.org你肯定会更加关注这个问题。谢谢你的提示!当我将胖罐子放在Accumulo文档中提到的特定文件夹中时,我就能够运行它:
[impl.ThriftScanner] DEBUG: Error getting transport to ip-x-x-x-100:10011 : NotServingTabletException(extent:TKeyExtent(table:21 30, endRow:21 30 3C, prevEndRow:null))