Nosql 使用HBase外壳使用过滤器进行扫描

Nosql 使用HBase外壳使用过滤器进行扫描,nosql,hbase,Nosql,Hbase,是否有人知道如何基于某些扫描过滤器扫描记录,例如: column:something=“somevalue” 类似于,但来自HBase外壳?使用扫描的过滤器参数,如用法帮助中所示: hbase(main):002:0> scan ERROR: wrong number of arguments (0 for 1) Here is some help for this command: Scan a table; pass table name and optionally a dict

是否有人知道如何基于某些扫描过滤器扫描记录,例如:

column:something=“somevalue”


类似于,但来自HBase外壳?

使用
扫描的过滤器参数,如用法帮助中所示:

hbase(main):002:0> scan

ERROR: wrong number of arguments (0 for 1)

Here is some help for this command:
Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS. If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.

Some examples:

  hbase> scan '.META.'
  hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}

For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

试试这个。这有点难看,但对我来说很管用

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes
scan 't1', { COLUMNS => 'family:qualifier', FILTER =>
    SingleColumnValueFilter.new
        (Bytes.toBytes('family'),
         Bytes.toBytes('qualifier'),
         CompareFilter::CompareOp.valueOf('EQUAL'),
         SubstringComparator.new('somevalue'))
}
HBase shell将包含~/.irbrc中的所有内容,因此您可以在其中添加类似的内容(我不是Ruby专家,欢迎改进):

然后你可以在壳里说:

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'

可以找到更多信息。请注意,附加的
Filter Language.docx
文件中有多个示例。

其中一个过滤器是Valuefilter,可用于过滤所有列值

Scan scan = new Scan();
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions
or MUST_PASS_ONE condition.

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
                   Bytes.toBytes("SOME COLUMN FAMILY" ),
                   Bytes.toBytes("SOME COLUMN NAME"),
                   CompareOp.EQUAL,
                   Bytes.toBytes("SOME VALUE"));

filter_by_name.setFilterIfMissing(true);  
//if you don't want the rows that have the column missing.
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name);


scan.setFilter(list);
hbase(main):067:0>扫描'dummytable',{FILTER=>“ValueFilter=,'binary:2016-01-26')”

二进制是滤波器中使用的比较器之一。您可以根据需要在过滤器中使用不同的比较器

您可以参考以下url:http:// 它提供了如何在HBase Shell中使用不同筛选器的良好示例。

在查询结束时添加setFilterIfMissing(true)

hbase(main):009:0> import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
 import org.apache.hadoop.hbase.filter.BinaryComparator;
 import org.apache.hadoop.hbase.filter.CompareFilter;
 import org.apache.hadoop.hbase.filter. Filter;

 scan 'test:test8', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'),
      Bytes.toBytes('ACCOUNT_NUMBER'), CompareFilter::CompareOp.valueOf('EQUAL'),
      BinaryComparator.new(Bytes.toBytes('0003000587'))).setFilterIfMissing(true)}

这真是太难看了。不过谢谢,在HBase docs/book/oreilly书中找不到任何这样的例子。我认为这种过滤解析语言只在HBase的更高版本中起作用-在0.90.6(cdh 3u6)中,我无法对此进行任何修改;下面是0.94的javadoc:只有链接的答案不是好问题。发布一些代码并加以解释以提供帮助。该链接不再起作用。带你去一个垃圾邮件站点这段代码是用Java编写的,问题是关于HBase外壳的。
Scan scan = new Scan();
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions
or MUST_PASS_ONE condition.

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
                   Bytes.toBytes("SOME COLUMN FAMILY" ),
                   Bytes.toBytes("SOME COLUMN NAME"),
                   CompareOp.EQUAL,
                   Bytes.toBytes("SOME VALUE"));

filter_by_name.setFilterIfMissing(true);  
//if you don't want the rows that have the column missing.
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name);


scan.setFilter(list);
hbase(main):009:0> import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
 import org.apache.hadoop.hbase.filter.BinaryComparator;
 import org.apache.hadoop.hbase.filter.CompareFilter;
 import org.apache.hadoop.hbase.filter. Filter;

 scan 'test:test8', { FILTER => SingleColumnValueFilter.new(Bytes.toBytes('account'),
      Bytes.toBytes('ACCOUNT_NUMBER'), CompareFilter::CompareOp.valueOf('EQUAL'),
      BinaryComparator.new(Bytes.toBytes('0003000587'))).setFilterIfMissing(true)}