Hbase：如何使用源Hbase表中的数据向目标Hbase表添加列族_Hbase

Hbase：如何使用源Hbase表中的数据向目标Hbase表添加列族

hbase

Hbase：如何使用源Hbase表中的数据向目标Hbase表添加列族,hbase,Hbase,嗨，我有一个巨大的Hbase表（源代码），有5个列族。我想将一个列族从源添加到目标Hbase表中，同时添加已有5个列族的数据例：创建表“源”“1”“2”“3”“4”“5”（假设我们有所有列族的数据）创建表“目标”“10”“20”“30”“40”“50”（假设我们有所有列族的数据）输出：目标表应该有一个来自源的列和数据描述“目标” {NAME=>'10'}{NAME=>'20'}{NAME=>'30'}{NAME=>'40'}{NAME=>'50'}{NAME=>'5'}您可以为此使用

嗨，我有一个巨大的Hbase表（源代码），有5个列族。我想将一个列族从源添加到目标Hbase表中，同时添加已有5个列族的数据

例：

创建表“源”“1”“2”“3”“4”“5”

（假设我们有所有列族的数据）

创建表“目标”“10”“20”“30”“40”“50”

（假设我们有所有列族的数据）

输出：

目标表应该有一个来自源的列和数据

描述“目标”

{NAME=>'10'}{NAME=>'20'}{NAME=>'30'}{NAME=>'40'}{NAME=>'50'}{NAME=>'5'}

您可以为此使用CopyTable。它用于复制逗号分隔的族

 private static void printUsage(final String errorMsg) {
if (errorMsg != null && errorMsg.length() > 0) {
  System.err.println("ERROR: " + errorMsg);
}
System.err.println("Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] " +
    "[--new.name=NEW] [--peer.adr=ADR] <tablename>");
System.err.println();
System.err.println("Options:");
System.err.println(" rs.class     hbase.regionserver.class of the peer cluster");
System.err.println("              specify if different from current cluster");
System.err.println(" rs.impl      hbase.regionserver.impl of the peer cluster");
System.err.println(" startrow     the start row");
System.err.println(" stoprow      the stop row");
System.err.println(" starttime    beginning of the time range (unixtime in millis)");
System.err.println("              without endtime means from starttime to forever");
System.err.println(" endtime      end of the time range.  Ignored if no starttime specified.");
System.err.println(" versions     number of cell versions to copy");
System.err.println(" new.name     new table's name");
System.err.println(" peer.adr     Address of the peer cluster given in the format");
System.err.println("              hbase.zookeeper.quorum:hbase.zookeeper.client"
    + ".port:zookeeper.znode.parent");
System.err.println(" families     comma-separated list of families to copy");
System.err.println("              To copy from cf1 to cf2, give sourceCfName:destCfName. ");
System.err.println("              To keep the same name, just give \"cfName\"");
System.err.println(" all.cells    also copy delete markers and deleted cells");
System.err.println(" bulkload     Write input into HFiles and bulk load to the destination "
    + "table");
System.err.println();
System.err.println("Args:");
System.err.println(" tablename    Name of the table to copy");
System.err.println();
System.err.println("Examples:");
System.err.println(" To copy 'TestTable' to a cluster that uses replication for a 1 hour window:");
System.err.println(" $ hbase " +
    "org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 " +
    "--peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable ");
System.err.println("For performance consider the following general option:\n"
    + "  It is recommended that you set the following to >=100. A higher value uses more memory but\n"
    + "  decreases the round trip time to the server and may increase performance.\n"
    + "    -Dhbase.client.scanner.caching=100\n"
    + "  The following should always be set to false, to prevent writing data twice, which may produce \n"
    + "  inaccurate results.\n"
    + "    -Dmapreduce.map.speculative=false");}

private static void printUsage（最终字符串errorMsg）{
如果（errorMsg！=null&&errorMsg.length（）>0）{
System.err.println（“错误：+errorMsg”）；
}
System.err.println（“用法：CopyTable[常规选项][--starttime=X][--endtime=Y]”+
“[--new.name=new][--peer.adr=adr]”；
System.err.println（）；
System.err.println（“选项：”）；
System.err.println（“对等集群的rs.class hbase.regionserver.class”）；
System.err.println（“指定是否与当前集群不同”）；
System.err.println（“对等集群的rs.impl hbase.regionserver.impl”）；
System.err.println（“startrow起始行”）；
System.err.println（“停止行停止行”）；
System.err.println（“时间范围的起始时间（unixtime，单位为毫秒）”；
System.err.println（“无结束时间意味着从开始到永远”）；
System.err.println（“时间范围的endtime end。如果未指定starttime，则忽略”）；
System.err.println（“要复制的单元格版本数”）；
System.err.println（“new.name新表的名称”）；
System.err.println（“格式中给出的对等集群的peer.adr地址”）；
System.err.println（“hbase.zookeeper.quorum:hbase.zookeeper.client”
+“.port:zookeeper.znode.parent”）；
System.err.println（“要复制的系列逗号分隔列表”）；
System.err.println（“要从cf1复制到cf2，请给出sourceCfName:destCfName。”）；
System.err.println（“要保持相同的名称，只需给出\“cfName\”）；
System.err.println（“all.cells还复制删除标记和删除的单元格”）；
System.err.println（“批量加载将输入写入HFiles并批量加载到目标”
+“表格”）；
System.err.println（）；
System.err.println（“Args:”）；
System.err.println（“要复制的表的表名”）；
System.err.println（）；
System.err.println（“示例：”）；
System.err.println（“将“TestTable”复制到使用复制1小时窗口的集群：”；
System.err.println（$hbase）+
“org.apache.hadoop.hbase.mapreduce.CopyTable--starttime=1265875194289--endtime=1265878794289”+
“--peer.adr=server1，server2，server3:2181:/hbase--families=myOldCf:myNewCf，cf2，cf3 TestTable”）；
Surviv.Err.PrtLn（“性能”考虑以下一般选项：\n）
+“建议您将以下值设置为>=100。值越高，占用的内存越多，但\n”
+“减少到服务器的往返时间并可能提高性能。\n”
+“-Dhbase.client.scanner.caching=100\n”
+以下内容应始终设置为false，以防止两次写入数据，这可能会产生\n
+“结果不准确。\n”
+“-Dmapreduce.map.prospective=false”）；}

资料来源：