Hadoop HBase:get（…）vs扫描和内存表_Hadoop_Mapreduce_Hbase

Hadoop HBase:get（…）vs扫描和内存表

hadoop mapreduce hbase

Hadoop HBase:get（…）vs扫描和内存表,hadoop,mapreduce,hbase,Hadoop,Mapreduce,Hbase,我要处决HBase先生 reducer中的业务逻辑大量访问两个表，比如T1（40k行）和T2（90k行）。目前，我正在执行以下步骤： 1.在reducer类的构造函数中，执行以下操作： HBaseCRUD hbaseCRUD = new HBaseCRUD(); HTableInterface t1= hbaseCRUD.getTable("T1", "CF1", null, "C1", "C2"); HTableInterface t2

我要处决HBase先生

reducer中的业务逻辑大量访问两个表，比如T1（40k行）和T2（90k行）。目前，我正在执行以下步骤：

1.在reducer类的构造函数中，执行以下操作：

HBaseCRUD hbaseCRUD = new HBaseCRUD();

HTableInterface t1= hbaseCRUD.getTable("T1",
                            "CF1", null, "C1", "C2");
HTableInterface t2= hbaseCRUD.getTable("T2",
                            "CF1", null, "C1", "C2");

在reduce（…）

虽然首先不确定上述代码是否合理，但我有一个问题-get（…）是否比扫描有任何性能优势？

Get get = new Get(lowercase.getBytes());
Result getResult = t1.get(get);

由于T1和T2将是只读的（大部分），我认为如果保存在内存中，性能将会提高。根据HBase文档，我必须重新创建表T1和T2。请验证我理解的正确性：

public void createTables(String tableName, boolean readOnly,
            boolean blockCacheEnabled, boolean inMemory,
            String... columnFamilyNames) throws IOException {
        // TODO Auto-generated method stub

        HTableDescriptor tableDesc = new HTableDescriptor(tableName);
        /* not sure !!! */
        tableDesc.setReadOnly(readOnly);

        HColumnDescriptor columnFamily = null;

        if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) {

            for (String columnFamilyName : columnFamilyNames) {

                columnFamily = new HColumnDescriptor(columnFamilyName);
                /*
                 * Start : Do these steps ensure that the column
                 * family(actually, the column data) is in-memory???
                 */
                columnFamily.setBlockCacheEnabled(blockCacheEnabled);
                columnFamily.setInMemory(inMemory);
                /*
                 * End : Do these steps ensure that the column family(actually,
                 * the column data) is in-memory???
                 */

                tableDesc.addFamily(columnFamily);
            }
        }

        hbaseAdmin.createTable(tableDesc);
        hbaseAdmin.close();
    }

完成后：

如何验证列是否在内存中（当然，descripe语句和浏览器反映了它），并从内存而不是磁盘访问这些列

从内存或磁盘读取的数据对客户端是透明的吗？简单地说，我是否需要更改reducer类中的HTable访问代码？如果是，有哪些变化

就实施而言，这两者之间没有实质性区别。它们都与客户端相同

get（…）是否比扫描有任何性能优势？

Get直接对作为参数传递给Get实例的rowkey标识的特定行进行操作。当扫描对所有行进行操作时，如果您没有通过向扫描实例提供开始和结束行键来使用范围查询。显然，如果您事先知道要对哪一行进行操作，效率会更高。您可以直接到那里执行所需的操作

如何验证列是否在内存中（当然，descripe语句和浏览器反映了它），并从内存而不是磁盘访问这些列？

您可以使用HColumnDescriptor提供的isInMemory（）方法验证特定CF是否在内存中。但是，您无法确定整个表是否在内存中，也无法确定是从磁盘还是从内存中获取数据。虽然内存中的块具有最高优先级，但并不是100%确定所有内容都始终在内存中。这里一个重要的事情是，即使在内存中有CF，数据也会持久化到磁盘

从内存或磁盘读取的数据对客户端是透明的吗？简单地说，我是否需要更改reducer类中的HTable访问代码？如果是，有哪些变化？

对。它是完全透明的。您不必做任何额外的操作。

因此，我可以简单地用一个简单的get（…）替换扫描代码，而不必担心性能差异吗？get允许您选择某些行，scan允许您选择行的范围。就是这样。好问题-我想知道为什么没有其他人投了赞成票。+1好答案：指定正确获取和扫描之间的区别，以及内存中的行为

public void createTables(String tableName, boolean readOnly,
            boolean blockCacheEnabled, boolean inMemory,
            String... columnFamilyNames) throws IOException {
        // TODO Auto-generated method stub

        HTableDescriptor tableDesc = new HTableDescriptor(tableName);
        /* not sure !!! */
        tableDesc.setReadOnly(readOnly);

        HColumnDescriptor columnFamily = null;

        if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) {

            for (String columnFamilyName : columnFamilyNames) {

                columnFamily = new HColumnDescriptor(columnFamilyName);
                /*
                 * Start : Do these steps ensure that the column
                 * family(actually, the column data) is in-memory???
                 */
                columnFamily.setBlockCacheEnabled(blockCacheEnabled);
                columnFamily.setInMemory(inMemory);
                /*
                 * End : Do these steps ensure that the column family(actually,
                 * the column data) is in-memory???
                 */

                tableDesc.addFamily(columnFamily);
            }
        }

        hbaseAdmin.createTable(tableDesc);
        hbaseAdmin.close();
    }