Hadoop 用火花插入Hbase_Hadoop_Apache Spark_Hbase

Hadoop 用火花插入Hbase

hadoop apache-spark hbase

Hadoop 用火花插入Hbase,hadoop,apache-spark,hbase,Hadoop,Apache Spark,Hbase,我有spark streaming任务，在这里我正在做一些聚合，现在我想把这些记录插入HBase，但如果for rowkey可用，我想做的不是典型的插入UPSERT，而不是列值sum（newvalue+oldvalue）。有人在java中共享伪代码吗？我怎样才能做到这一点？类似这样的东西 byte[] rowKey = null; // Provided Table table = null; // Provided long newValue = 1000; // Provided byte

我有spark streaming任务，在这里我正在做一些聚合，现在我想把这些记录插入HBase，但如果for rowkey可用，我想做的不是典型的插入UPSERT，而不是列值sum（newvalue+oldvalue）。有人在java中共享伪代码吗？我怎样才能做到这一点？

类似这样的东西

byte[] rowKey = null; // Provided
Table table = null; // Provided
long newValue = 1000; // Provided
byte[] FAMILY = new byte[]{0}; // Defined
byte[] QUALIFIER = new byte[]{1}; // Defined

try {
    Get get = new Get(rowKey);
    Result result = table.get(get);
    if (!result.isEmpty()) {
        Cell cell = result.getColumnLatestCell(FAMILY, QUALIFIER);
        newValue += Bytes.bytesToLong(cell.getValueArray(),cell.getValueOffset());
    }
    Put put = new Put(rowKey);
    put.addColumn(FAMILY,QUALIFIER,Bytes.toBytes(newValue));
    table.put(put);
} catch (Exception e) {
    // Handle Exceptions...
}

我们（Splice Machine[开源]）有一些非常酷的教程，使用Spark Streaming在HBase中存储数据

退房。可能很有趣。

我发现下面是伪代码：-

===========For UPSERT(Update and Insert)===========

public void HbaseUpsert(JavaRDD < Row > javaRDD) throws IOException, ServiceException {

         JavaPairRDD < ImmutableBytesWritable, Put > hbasePuts1 = javaRDD.mapToPair(

          new PairFunction < Row, ImmutableBytesWritable, Put > () {

            private static final long serialVersionUID = 1L;
        public Tuple2 < ImmutableBytesWritable, Put > call(Row row) throws Exception {

                if(HbaseConfigurationReader.getInstance()!=null)
                {
                HTable table = new HTable(HbaseConfigurationReader.getInstance().initializeHbaseConfiguration(), "TEST");

            try {

               String Column1 = row.getString(1);
               long Column2 = row.getLong(2); 
               Get get = new Get(Bytes.toBytes(row.getString(0)));  
                   Result result = table.get(get);
                   if (!result.isEmpty()) {
                       Cell cell = result.getColumnLatestCell(Bytes.toBytes("cf1"), Bytes.toBytes("Column2"));
                       Column2 += Bytes.toLong(cell.getValueArray(),cell.getValueOffset());
                     }
                Put put = new Put(Bytes.toBytes(row.getString(0)));
                put.add(Bytes.toBytes("cf1"), Bytes.toBytes("Column1"), Bytes.toBytes(Column1));
                put.add(Bytes.toBytes("cf1"), Bytes.toBytes("Column2"), Bytes.toBytes(Column2));
                return new Tuple2 < ImmutableBytesWritable, Put > (new ImmutableBytesWritable(), put);

            } catch (Exception e) {

                e.printStackTrace();
            }
            finally {
                table.close();
            }
                }
            return null;
           }
          });

         hbasePuts1.saveAsNewAPIHadoopDataset(HbaseConfigurationReader.initializeHbaseConfiguration());

        }

==============For Configuration===============
public class HbaseConfigurationReader implements Serializable{

    static Job newAPIJobConfiguration1 =null;
    private static Configuration conf =null;
    private static HTable table= null; 
    private static HbaseConfigurationReader instance= null;

    private static Log logger= LogFactory.getLog(HbaseConfigurationReader.class);


HbaseConfigurationReader() throws MasterNotRunningException, ZooKeeperConnectionException, ServiceException, IOException{
    initializeHbaseConfiguration();
}

public static HbaseConfigurationReader getInstance() throws MasterNotRunningException, ZooKeeperConnectionException, ServiceException, IOException {

    if (instance == null) {
        instance = new HbaseConfigurationReader();
    }

    return instance;
}
public static Configuration initializeHbaseConfiguration() throws MasterNotRunningException, ZooKeeperConnectionException, ServiceException, IOException {
     if(conf==null)
     {
        conf=HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "localhost");
        conf.set("hbase.zookeeper.property.clientPort", "2181");
        HBaseAdmin.checkHBaseAvailable(conf);
        table = new HTable(conf, "TEST");
         conf.set(org.apache.hadoop.hbase.mapreduce.TableInputFormat.INPUT_TABLE, "TEST");
        try {
            newAPIJobConfiguration1 = Job.getInstance(conf);
            newAPIJobConfiguration1.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "TEST");
            newAPIJobConfiguration1.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);
        } catch (IOException e) {
            e.printStackTrace();
        }

     }

     else
         logger.info("Configuration comes null"); 

    return newAPIJobConfiguration1.getConfiguration();
 }
}

=============用于UPSERT（更新和插入）===========
public void HbaseUpsert（JavaRDDJavaRDD）抛出IOException、ServiceException{
javapairddhbasePuts1=javaRDD.mapToPair(
新的PairFunction（）{
私有静态最终长serialVersionUID=1L；
公共元组2调用（行）引发异常{
if（HbaseConfigurationReader.getInstance（）！=null）
{
HTable table=新的HTable（HbaseConfigurationReader.getInstance（）.initializeHbaseConfiguration（），“测试”）；
试一试{
String Column1=row.getString（1）；
long Column2=row.getLong（2）；
Get=newget（Bytes.toBytes（row.getString（0））；
结果=table.get（get）；
如果（！result.isEmpty（））{
Cell Cell=result.getColumnLatestCell（Bytes.toBytes（“cf1”）、Bytes.toBytes（“Column2”）；
Column2+=Bytes.toLong（cell.getValueArray（），cell.getValueOffset（））；
}
Put Put=new Put（Bytes.toBytes（row.getString（0））；
put.add（Bytes.toBytes（“cf1”）、Bytes.toBytes（“Column1”）、Bytes.toBytes（Column1））；
put.add（Bytes.toBytes（“cf1”）、Bytes.toBytes（“Column2”）、Bytes.toBytes（Column2））；
返回新的Tuple2（新的ImmutableBytesWritable（），Put）；
}捕获（例外e）{
e、 printStackTrace（）；
}
最后{
table.close（）；
}
}
返回null；
}
});
hbasePuts1.saveAsNewAPIHadoopDataset（HbaseConfigurationReader.initializeHbaseConfiguration（））；
}
================用于配置===============
公共类HbaseConfigurationReader实现可序列化{
静态作业newAPIJobConfiguration1=null；
私有静态配置conf=null；
私有静态HTable表=null；
私有静态HbaseConfigurationReader实例=null；
私有静态日志记录器=LogFactory.getLog（HbaseConfigurationReader.class）；
HbaseConfigurationReader（）引发MasterNotRunningException、ZookePerConnectionException、ServiceException、IOException{
初始化基本配置（）；
}
公共静态HbaseConfigurationReader getInstance（）引发MasterNotRunningException、ZookePerConnectionException、ServiceException、IOException{
if（实例==null）{
实例=新的HbaseConfigurationReader（）；
}
返回实例；
}
公共静态配置initializeHbaseConfiguration（）引发MasterNotRunningException、ZookePerConnectionException、ServiceException、IOException{
如果（conf==null）
{
conf=HBaseConfiguration.create（）；
conf.set（“hbase.zookeeper.quorum”、“localhost”）；
conf.set（“hbase.zookeeper.property.clientPort”，“2181”）；
HBaseAdmin.checkHBaseAvailable（conf）；
表=新的HTable（配置，“测试”）；
conf.set（org.apache.hadoop.hbase.mapreduce.TableInputFormat.INPUT_TABLE，“测试”）；
试一试{
newAPIJobConfiguration1=Job.getInstance（conf）；
newAPIJobConfiguration1.getConfiguration（）.set（TableOutputFormat.OUTPUT_TABLE，“TEST”）；
newAPIJobConfiguration1.setOutputFormatClass（org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class）；
}捕获（IOE异常）{
e、 printStackTrace（）；
}
}
其他的
info（“配置为空”）；
返回newAPIJobConfiguration1.getConfiguration（）；
}
}
你也可能想考虑一个HbASE增量取决于你正在添加的类型……嗨，约翰，谢谢你的回复，它工作了，但是花了500 MB的时间，它在Spice中花费了1Hs，你知道有什么大批量更新的东西吗？我在探索拼接机，也参加了“LabbdAdbox”的网络研讨会。并向“Thomas Ryan”发送了一封电子邮件，等待他的回复，我感谢拼接机的工作，您能告诉我应该检查哪一个教程吗，因为我检查了很多，但找不到任何具体的教程。再次感谢你…这是视频…这是代码