Hadoop HBase:get(…)vs扫描和内存表
我要处决HBase先生 reducer中的业务逻辑大量访问两个表,比如T1(40k行)和T2(90k行)。目前,我正在执行以下步骤: 1.在reducer类的构造函数中,执行以下操作:Hadoop HBase:get(…)vs扫描和内存表,hadoop,mapreduce,hbase,Hadoop,Mapreduce,Hbase,我要处决HBase先生 reducer中的业务逻辑大量访问两个表,比如T1(40k行)和T2(90k行)。目前,我正在执行以下步骤: 1.在reducer类的构造函数中,执行以下操作: HBaseCRUD hbaseCRUD = new HBaseCRUD(); HTableInterface t1= hbaseCRUD.getTable("T1", "CF1", null, "C1", "C2"); HTableInterface t2
HBaseCRUD hbaseCRUD = new HBaseCRUD();
HTableInterface t1= hbaseCRUD.getTable("T1",
"CF1", null, "C1", "C2");
HTableInterface t2= hbaseCRUD.getTable("T2",
"CF1", null, "C1", "C2");
在reduce(…)
虽然首先不确定上述代码是否合理,但我有一个问题-get(…)是否比扫描有任何性能优势?
Get get = new Get(lowercase.getBytes());
Result getResult = t1.get(get);
由于T1和T2将是只读的(大部分),我认为如果保存在内存中,性能将会提高。根据HBase文档,我必须重新创建表T1和T2。请验证我理解的正确性:
public void createTables(String tableName, boolean readOnly,
boolean blockCacheEnabled, boolean inMemory,
String... columnFamilyNames) throws IOException {
// TODO Auto-generated method stub
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
/* not sure !!! */
tableDesc.setReadOnly(readOnly);
HColumnDescriptor columnFamily = null;
if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) {
for (String columnFamilyName : columnFamilyNames) {
columnFamily = new HColumnDescriptor(columnFamilyName);
/*
* Start : Do these steps ensure that the column
* family(actually, the column data) is in-memory???
*/
columnFamily.setBlockCacheEnabled(blockCacheEnabled);
columnFamily.setInMemory(inMemory);
/*
* End : Do these steps ensure that the column family(actually,
* the column data) is in-memory???
*/
tableDesc.addFamily(columnFamily);
}
}
hbaseAdmin.createTable(tableDesc);
hbaseAdmin.close();
}
完成后:
对。它是完全透明的。您不必做任何额外的操作。因此,我可以简单地用一个简单的get(…)替换扫描代码,而不必担心性能差异吗?get允许您选择某些行,scan允许您选择行的范围。就是这样。好问题-我想知道为什么没有其他人投了赞成票。+1好答案:指定正确获取和扫描之间的区别,以及内存中的行为
public void createTables(String tableName, boolean readOnly,
boolean blockCacheEnabled, boolean inMemory,
String... columnFamilyNames) throws IOException {
// TODO Auto-generated method stub
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
/* not sure !!! */
tableDesc.setReadOnly(readOnly);
HColumnDescriptor columnFamily = null;
if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) {
for (String columnFamilyName : columnFamilyNames) {
columnFamily = new HColumnDescriptor(columnFamilyName);
/*
* Start : Do these steps ensure that the column
* family(actually, the column data) is in-memory???
*/
columnFamily.setBlockCacheEnabled(blockCacheEnabled);
columnFamily.setInMemory(inMemory);
/*
* End : Do these steps ensure that the column family(actually,
* the column data) is in-memory???
*/
tableDesc.addFamily(columnFamily);
}
}
hbaseAdmin.createTable(tableDesc);
hbaseAdmin.close();
}