从Apache Storm bolt在HBase中插入和删除值的方法
我在Hadoop上运行的Storm拓扑配置为伪分布式模式。拓扑包含一个螺栓,必须将数据写入Hbase。 出于测试目的,我的第一种方法是创建(并关闭)连接,并在bolt的从Apache Storm bolt在HBase中插入和删除值的方法,hbase,apache-storm,Hbase,Apache Storm,我在Hadoop上运行的Storm拓扑配置为伪分布式模式。拓扑包含一个螺栓,必须将数据写入Hbase。 出于测试目的,我的第一种方法是创建(并关闭)连接,并在bolt的execute方法中写入数据。但是,我的本地计算机上似乎没有太多资源来处理所有到HBase的请求。在成功处理了大约30个请求后,我在Storm workers的日志中看到了以下内容: o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.
execute
方法中写入数据。但是,我的本地计算机上似乎没有太多资源来处理所有到HBase的请求。在成功处理了大约30个请求后,我在Storm workers的日志中看到了以下内容:
o.a.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
o.a.z.ClientCnxn [INFO] Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
o.a.h.h.z.RecoverableZooKeeper [WARN] Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
我的想法是通过在prepare
方法中为我的螺栓连接的每个实例创建单个连接,并在cleanup
处将其关闭,从而减少到HBase的连接数。但是,根据文档cleanup
不保证在分布式模式下调用
在此之后,我找到了Storm的Hbase工作框架-Storm Hbase。不幸的是,几乎没有关于它的信息,只是在它的github repo上的自述
提前谢谢 例如,您可以使用“发布者”线程吗 这是:有一个单独的类作为线程运行,它将执行对hbase/mysql/elasticsearch/hdfs/etc的请求。。。为你。出于性能原因,我们应该分批进行
private transient BlockingQueue<Tuple> insertQueue;
private transient ExecutorService theExecutor;
private transient Future<?> publisherFuture;
@Override
public void execute(final Tuple _tuple) {
insertQueue.add(_tuple);
}
哦,孩子,我该发光了!我不得不从Storm向HBase写大量优化文章,希望这能对您有所帮助 如果您刚刚开始,这是一种将数据流传输到hbase的好方法。您只需克隆项目,进行maven安装,然后在拓扑中引用它 但是,如果您开始使用更复杂的逻辑,那么创建自己的类来与HBase对话可能是一条可行的道路。这就是我将要在我的回答中展示的 项目设置 我假设您正在使用maven和maven shade插件。您需要引用hbase客户端:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
注意:这里有我使用的其他配置的行,如果不需要,请删除它们。顺便说一句,我真的不喜欢这样包装配置,但。。。它使建立HBase连接变得更加容易,并解决了一系列奇怪的连接错误
在Storm中管理HBase连接
2018年3月19日更新:自我撰写此答案以来,HBase的API发生了重大变化,但概念是相同的
最重要的是在prepare
方法中为螺栓的每个实例创建一个H连接,然后在螺栓的整个使用寿命内重复使用该连接
Configuration config = HBaseConfiguration.create();
connection = HConnectionManager.createConnection(config);
首先,您可以对HBase进行单次输入。您可以通过这种方式在每次调用时打开/关闭表
// single put method
private HConnection connection;
@SuppressWarnings("rawtypes")
@Override
public void prepare(java.util.Map stormConf, backtype.storm.task.TopologyContext context) {
Configuration config = HBaseConfiguration.create();
connection = HConnectionManager.createConnection(config);
}
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
try {
// do stuff
// call putFruit
} catch (Exception e) {
LOG.error("bolt error", e);
collector.reportError(e);
}
}
// example put method you'd call from within execute somewhere
private void putFruit(String key, FruitResult data) throws IOException {
HTableInterface table = connection.getTable(Constants.TABLE_FRUIT);
try {
Put p = new Put(key.getBytes());
long ts = data.getTimestamp();
p.add(Constants.FRUIT_FAMILY, Constants.COLOR, ts, data.getColor().getBytes());
p.add(Constants.FRUIT_FAMILY, Constants.SIZE, ts, data.getSize().getBytes());
p.add(Constants.FRUIT_FAMILY, Constants.WEIGHT, ts, Bytes.toBytes(data.getWeight()));
table.put(p);
} finally {
try {
table.close();
} finally {
// nothing
}
}
}
注意,我正在使用这里的连接。我建议从这里开始,因为这样更容易工作和调试。最终,由于您试图通过网络发送的请求数量,这将无法扩展,您需要开始批处理多个拼凑
为了批量放置,您需要使用HConnection打开一个表并保持其打开状态。您还需要将自动刷新设置为false。这意味着该表将自动缓冲请求,直到达到“hbase.client.write.buffer”大小(默认值为2097152)
在这两种方法中,最好仍然尝试在cleanup
中关闭HBase连接。只是要知道,在你的工人被杀之前,它可能不会被调用
其他东西
- 要进行删除,只需执行
newdelete(键)代码>而不是Put
如果您有更多问题,请告诉我。您是否在计算机上以独立模式安装了hbase?你能在你的机器上运行“hbase外壳”吗?@AnilGupta hbase也被配置为伪分布式模式,hbase外壳可以工作
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4</version>
<configuration>
<createDependencyReducedPom>true</createDependencyReducedPom>
<artifactSet>
<excludes>
<exclude>classworlds:classworlds</exclude>
<exclude>junit:junit</exclude>
<exclude>jmock:*</exclude>
<exclude>*:xml-apis</exclude>
<exclude>org.apache.maven:lib:tests</exclude>
<exclude>log4j:log4j:jar:</exclude>
<exclude>org.testng:testng</exclude>
</excludes>
</artifactSet>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
<resource>core-site.xml</resource>
<file>src/main/resources/core-site.xml</file>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
<resource>hbase-site.xml</resource>
<file>src/main/resources/hbase-site.xml</file>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
<resource>hdfs-site.xml</resource>
<file>src/main/resources/hdfs-site.xml</file>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass></mainClass>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>junit/*</exclude>
<exclude>webapps/</exclude>
<exclude>testng*</exclude>
<exclude>*.js</exclude>
<exclude>*.png</exclude>
<exclude>*.css</exclude>
<exclude>*.json</exclude>
<exclude>*.csv</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
Configuration config = HBaseConfiguration.create();
connection = HConnectionManager.createConnection(config);
// single put method
private HConnection connection;
@SuppressWarnings("rawtypes")
@Override
public void prepare(java.util.Map stormConf, backtype.storm.task.TopologyContext context) {
Configuration config = HBaseConfiguration.create();
connection = HConnectionManager.createConnection(config);
}
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
try {
// do stuff
// call putFruit
} catch (Exception e) {
LOG.error("bolt error", e);
collector.reportError(e);
}
}
// example put method you'd call from within execute somewhere
private void putFruit(String key, FruitResult data) throws IOException {
HTableInterface table = connection.getTable(Constants.TABLE_FRUIT);
try {
Put p = new Put(key.getBytes());
long ts = data.getTimestamp();
p.add(Constants.FRUIT_FAMILY, Constants.COLOR, ts, data.getColor().getBytes());
p.add(Constants.FRUIT_FAMILY, Constants.SIZE, ts, data.getSize().getBytes());
p.add(Constants.FRUIT_FAMILY, Constants.WEIGHT, ts, Bytes.toBytes(data.getWeight()));
table.put(p);
} finally {
try {
table.close();
} finally {
// nothing
}
}
}
// batch put method
private static boolean AUTO_FLUSH = false;
private static boolean CLEAR_BUFFER_ON_FAIL = false;
private HConnection connection;
private HTableInterface fruitTable;
@SuppressWarnings("rawtypes")
@Override
public void prepare(java.util.Map stormConf, backtype.storm.task.TopologyContext context) {
Configuration config = HBaseConfiguration.create();
connection = HConnectionManager.createConnection(config);
fruitTable = connection.getTable(Constants.TABLE_FRUIT);
fruitTable.setAutoFlush(AUTO_FLUSH, CLEAR_BUFFER_ON_FAIL);
}
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
try {
// do stuff
// call putFruit
} catch (Exception e) {
LOG.error("bolt error", e);
collector.reportError(e);
}
}
// example put method you'd call from within execute somewhere
private void putFruit(String key, FruitResult data) throws IOException {
Put p = new Put(key.getBytes());
long ts = data.getTimestamp();
p.add(Constants.FRUIT_FAMILY, Constants.COLOR, ts, data.getColor().getBytes());
p.add(Constants.FRUIT_FAMILY, Constants.SIZE, ts, data.getSize().getBytes());
p.add(Constants.FRUIT_FAMILY, Constants.WEIGHT, ts, Bytes.toBytes(data.getWeight()));
fruitTable.put(p);
}