Algorithm 按位非字节比较
假设我得到这样一个字节:00010011(打开2位) 我希望将它与这些字节进行比较:{0000 0110,0000 0011,0011 0000,0000 1100} 其思想是获取不匹配的字节;其中(byteA和byteX)==0 例如,我应该获取/查找:{0000 0110,0000 1100} 如果我们编写一个循环字节数组的代码,这可能很容易。 这里有一个例子:Algorithm 按位非字节比较,algorithm,operators,bit-manipulation,Algorithm,Operators,Bit Manipulation,假设我得到这样一个字节:00010011(打开2位) 我希望将它与这些字节进行比较:{0000 0110,0000 0011,0011 0000,0000 1100} 其思想是获取不匹配的字节;其中(byteA和byteX)==0 例如,我应该获取/查找:{0000 0110,0000 1100} 如果我们编写一个循环字节数组的代码,这可能很容易。 这里有一个例子: byte seek = 17; byte[] pool = {6, 3, 48, 12 }; for(int p=0; p<
byte seek = 17;
byte[] pool = {6, 3, 48, 12 };
for(int p=0; p<pool.Length; p++)
{
if((pool[p] & seek)==0)
{
//Usefull
}
}
字节搜索=17;
字节[]池={6,3,48,12};
对于(int p=0;p而言,字节最大的优点是只有256种可能性
您可以先创建一个二维数组256x256,然后使用两个值查找该数组
您可以先创建数组,然后将结果作为静态实例存储在主程序中
static bool[256,256] LookUp = {
{true, true, true ... },
...
{false, false, false ...}
};
static bool IsUsefule(byte a, byte b) {
return LookUp[a][b];
}
- 编辑*
或者使用应答数组的数组
内部数组将只包含“有用”的字节
static List<<byte[]> LookUp = new List<byte[]>(256);
static byte[] IsUseful(byte a) {
return LookUp[a];
}
静态列表我唯一能想到的就是减少测试的数量:
for(int p1=1; p1<pool.Length; p1++)
{
for(int p2=0; p2<p1; p1++)
{
if((pool[p1] & pool[p2])==0)
{
//byte at p1 works with byte at p2.
//byte at p2 works with byte at p1.
}
}
}
for(int p1=1;p1这里有一个完全不同的想法,可能有效,也可能无效,这取决于您的池中的内容
将整个池放入零抑制二进制决策图中。pool
中的项将是集合,其中位为1的索引是该集合的元素。ZDD是所有这些集合的族。
要进行查询,请形成另一个ZDD-不包括seek
中为1的位的所有集合的族(这将是一个小的ZDD,以节点为单位),然后枚举这些ZDD的交集中的所有集合
从交集中枚举所有这些集合是一种输出敏感算法,但计算交集需要时间,这取决于ZDD的大小,因此它是否工作良好取决于池
是否是一个好的ZDD(查询ZDD肯定很好)。当然,您必须准备ZDD,因此在任何情况下,只有您计划经常查询同一个池时,它才有帮助。一个相当普遍的解决方案是“位转置”你的数据,例如,一个包含所有高阶数据位的词块,一个包含从那里往下一个位置的所有位的词块,等等。然后在两位查询中,你或将两个这样的词块放在一起,然后查找0位-因此,如果结果词是-1,你可以完全跳过它。找到所有这些词的位置0位在字x中,看看popcnt(x^(x+1)):如果x=…10111,那么x+1=…11000所以x^(x+1)=000..01111-然后popcnt将告诉您最低阶0的位置。实际上,最大的成功可能是当您的大多数数据不满足查询时,您可以跳过整句话:当您有很多匹配项时,与您计划对匹配项执行的任何操作相比,任何方案下的查询成本都可能很小。在数据库中e、 这里有很多信息和指向源代码的指针
Knuth第2卷第6.5节“二进制属性”中有许多查询0/1数据的方法。其中大多数要求您对数据的分布有一些了解,以识别它们的适用位置。其中一个想法通常是适用的-如果您有任何类型的树结构或索引结构,您可以在树的节点中保留或/或其下的所有信息。然后,您可以根据您有时可能会发现,该节点下的任何内容都不可能与您的查询匹配,在这种情况下,您可以将其全部跳过。如果位之间存在连接,这可能是最有用的,例如,如果您仅通过对池进行排序并将其切割成块来分割池,甚至是不影响分割成块的位NK总是在某些块中设置,从不在其他块中设置。首先,我的英语很差,但我希望你能理解。另外,我知道我的答案有点晚,但我认为仍然有用。
正如有人指出的,最好的解决方案是生成一个查找表。
为了达到这个目的,您必须对每个循环迭代案例进行硬编码
数组。
幸运的是,我们使用的是字节,因此只能使用256个字节
例如,如果我们拿你的模式列表{3,6,12,48},我们
获取此表:
0: { 3, 6, 12, 48 }
1: { 6, 12, 48 }
2: { 12, 48 }
3: { 12, 48 }
3: { 12, 48 }
...
252: { 3 }
253: -
254: -
255: -
我们使用输入字节作为查找表中的索引,以获取与输入字节不匹配的列表模式值。
实施:
我使用Python生成了两个头文件,一个是带有查找表的头文件
定义,另一个具有所需的模式列表值。然后,我将此文件包含在新的C项目中,仅此而已
Python代码
结论:
在我看来,使用查找表可以通过
增加代码大小,但这种改进并不过分
重要。要开始注意差异,输入字节的数量应
你应该为你使用的语言添加一个标签。我可以不知道你是用C还是C++还是??拼写错误。谢谢你快速重放。我使用C语言,但是我也很酷C++。我真的需要一个算法……如果你看到我的例子,我在寻找一个索引字节数组的方法。e返回所有可能使用“00010001”的字节的搜索。我不知道是否可以使用字典或其他方法实现这一点,但我希望避免循环数组以查找匹配的字节。匹配意味着它们没有共同点:(字节1和字节2)==0不使用xor是否有特殊原因?xor将返回您“差异”,您可以在以下情况下轻松检查单个位:necessary@jancha为什么?xor很有用,但在这种情况下,它只是额外的一步-可能还需要一些睡眠。误解了问题否。执行a&b
总是比查找[a][b]快得多
。不使用内存比使用内存快。然后只需预先剔除内部列表,使其仅包含这些字节
0: { 3, 6, 12, 48 }
1: { 6, 12, 48 }
2: { 12, 48 }
3: { 12, 48 }
3: { 12, 48 }
...
252: { 3 }
253: -
254: -
255: -
#! /usr/bin/env python
from copy import copy
from time import *
import getopt
import sys
class LUPattern:
__LU_SZ = 256
BASIC_TYPE_CODE = "const uint8_t"
BASIC_ID_CODE = "p"
LU_ID_CODE = "lu"
LU_HEADER_CODE = "__LUPATTERN_H__"
LU_SZ_PATTLIST_ID_CODE = "SZ"
PAT_HEADER_CODE = "__PATTERNLIST_H__"
PAT_ID_CODE = "patList"
def __init__(self, patList):
self.patList = copy(patList)
def genLU(self):
lu = []
pl = list( set(self.patList) )
pl.sort()
for i in xrange(LUPattern.__LU_SZ):
e = []
for p in pl:
if (i & p) == 0:
e.append(p)
lu.append(e)
return lu
def begCode(self):
code = "// " + asctime() + "\n\n" \
+ "#ifndef " + LUPattern.LU_HEADER_CODE + "\n" \
+ "#define " + LUPattern.LU_HEADER_CODE + "\n" \
+ "\n#include <stdint.h>\n\n" \
return code
def luCode(self):
lu = self.genLU()
pDict = {}
luSrc = LUPattern.BASIC_TYPE_CODE \
+ " * const " \
+ LUPattern.LU_ID_CODE \
+ "[%i] = { \n\t" % LUPattern.__LU_SZ
for i in xrange(LUPattern.__LU_SZ):
if lu[i]:
pId = "_%i" * len(lu[i])
pId = pId % tuple(lu[i])
pId = LUPattern.BASIC_ID_CODE + pId
pBody = "{" + "%3i, " * len(lu[i]) + " 0 }"
pBody = pBody % tuple(lu[i])
pDict[pId] = pBody
luSrc += pId
else:
luSrc += "0"
luSrc += (i & 3) == 3 and (",\n\t") or ", "
luSrc += "\n};"
pCode = ""
for pId in pDict.keys():
pCode += "static " + \
LUPattern.BASIC_TYPE_CODE + \
" " + pId + "[] = " + \
pDict[pId] + ";\n"
return (pCode, luSrc)
def genCode(self):
(pCode, luSrc) = self.luCode()
code = self.begCode() \
+ pCode + "\n\n" \
+ luSrc + "\n\n#endif\n\n"
return code
def patCode(self):
code = "// " + asctime() + "\n\n" \
+ "#ifndef " + LUPattern.PAT_HEADER_CODE + "\n" \
+ "#define " + LUPattern.PAT_HEADER_CODE + "\n" \
+ "\n#include <stdint.h>\n\n"
code += "enum { " \
+ LUPattern.LU_SZ_PATTLIST_ID_CODE \
+ " = %i, " % len(self.patList) \
+ "};\n\n"
code += "%s %s[] = { " % ( LUPattern.BASIC_TYPE_CODE,
LUPattern.PAT_ID_CODE )
for p in self.patList:
code += "%i, " % p
code += "};\n\n#endif\n\n"
return code
#########################################################
def msg():
hmsg = "Usage: "
hmsg += "%s %s %s" % (
sys.argv[0],
"-p",
"\"{pattern0, pattern1, ... , patternN}\"\n\n")
hmsg += "Options:"
fmt = "\n%5s, %" + "%is" % ( len("input pattern list") + 3 )
hmsg += fmt % ("-p", "input pattern list")
fmt = "\n%5s, %" + "%is" % ( len("output look up header file") + 3 )
hmsg += fmt % ("-l", "output look up header file")
fmt = "\n%5s, %" + "%is" % ( len("output pattern list header file") + 3 )
hmsg += fmt % ("-f", "output pattern list header file")
fmt = "\n%5s, %" + "%is" % ( len("print this message") + 3 )
hmsg += fmt % ("-h", "print this message")
print hmsg
exit(0)
def getPatternList(patStr):
pl = (patStr.strip("{}")).split(',')
return [ int(i) & 255 for i in pl ]
def parseOpt():
patList = [ 255 ] # Default pattern
luFile = sys.stdout
patFile = sys.stdout
try:
opts, args = getopt.getopt(sys.argv[1:], "hp:l:f:", ["help", "patterns="])
except getopt.GetoptError:
msg()
for op in opts:
if op[0] == '-p':
patList = getPatternList(op[1])
elif op[0] == '-l':
luFile = open(op[1], 'w')
elif op[0] == '-f':
patFile = open(op[1], 'w')
elif op[0] == '-h':
msg()
return (patList, luFile, patFile)
def main():
(patList, luFile, patFile) = parseOpt()
lug = LUPattern(patList)
print >> luFile , lug.genCode()
print >> patFile, lug.patCode()
patFile.close()
luFile.close()
if __name__ == "__main__":
main()
#include "pl.h"
#include "lu.h"
#include <stdio.h>
int main(void)
{
uint32_t stats[SZ + 1] = { 0 };
uint8_t b;
while( fscanf(stdin, "%c", &b) != EOF )
{
(void)lu[b];
// lu[b] has bytes that don't match with b
}
return 0;
}
#include "pl.h"
#include "lu.h"
#include <stdio.h>
void doStats(const uint8_t * const, uint32_t * const);
void printStats(const uint32_t * const);
int main(void)
{
uint32_t stats[SZ + 1] = { 0 };
uint8_t b;
while( fscanf(stdin, "%c", &b) != EOF )
{
/* lu[b] has pattern values that not match with input b */
doStats(lu[b], stats);
}
printStats(stats);
return 0;
}
void doStats(const uint8_t * const noBitMatch, uint32_t * const stats)
{
uint8_t i, j = 0;
if(noBitMatch)
{
for(i = 0; noBitMatch[i] != 0; i++)
for(; j < SZ; j++)
if( noBitMatch[i] == patList[j] )
{
stats[j]++;
break;
}
}
else
stats[SZ]++;
}
void printStats(const uint32_t * const stats)
{
const uint8_t * const patList = lu[0];
uint8_t i;
printf("Stats: \n");
for(i = 0; i < SZ; i++)
printf(" %3i%-3c%9i\n", patList[i], ':', stats[i]);
printf(" ---%-3c%9i\n", ':', stats[SZ]);
}
#include "pl.h"
#include <stdio.h>
#include <stdint.h>
#include <string.h>
void getNoBitMatch(const uint8_t, uint8_t * const);
void doStats(const uint8_t * const, uint32_t * const);
void printStats(const uint32_t * const);
int main(void)
{
uint8_t b;
uint8_t noBitMatch[SZ];
uint32_t stats[SZ + 1] = { 0 };
while( fscanf(stdin, "%c", &b ) != EOF )
{
getNoBitMatch(b, noBitMatch);
doStats(noBitMatch, stats);
}
printStats(stats);
return 0;
}
void doStats(const uint8_t * const noBitMatch, uint32_t * const stats)
{
uint32_t i;
uint8_t f;
for(i = 0, f = 0; i < SZ; i++)
{
f = ( (noBitMatch[i]) ? 1 : f );
stats[i] += noBitMatch[i];
}
stats[SZ] += (f) ? 0 : 1;
}
void getNoBitMatch(const uint8_t b, uint8_t * const noBitMatch)
{
uint8_t i;
for(i = 0; i < SZ; i++)
noBitMatch[i] = ( (b & patList[i]) == 0 ) ? 1 : 0;
}
void printStats(const uint32_t * const stats)
{
uint8_t i;
printf("Stats: \n");
for(i = 0; i < SZ; i++)
printf(" %3i%-3c%9i\n", patList[i], ':', stats[i]);
printf(" ---%-3c%9i\n", ':', stats[SZ]);
}
###
CC = gcc
CFLAGS = -c -Wall
SPDUP = -O3
DEBUG = -ggdb3 -O0
EXECUTABLE = noloop
AUXEXEC = loop
LU_SCRIPT = ./lup.py
LU_HEADER = lu.h
LU_PATLIST_HEADER = pl.h
#LU_PATLIST = -p "{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }"
#LU_PATLIST = -p "{ 3, 6, 12, 15, 32, 48, 69, 254 }"
LU_PATLIST = -p "{ 3, 6, 12, 48 }"
#LU_PATLIST = -p "{ 1, 2 }"
#LU_PATLIST = -p "{ 1 }"
LU_FILE = -l $(LU_HEADER)
LU_PAT_FILE = -f $(LU_PATLIST_HEADER)
SRC= noloop.c loop.c
SOURCE = $(EXECUTABLE).c
OBJECTS = $(SOURCE:.c=.o)
AUXSRC = $(AUXEXEC).c
AUXOBJ = $(AUXSRC:.c=.o)
all: $(EXECUTABLE) $(AUXEXEC)
lookup:
$(LU_SCRIPT) $(LU_PATLIST) $(LU_FILE) $(LU_PAT_FILE)
touch $(SRC)
$(EXECUTABLE): lookup $(OBJECTS)
$(CC) $(OBJECTS) -o $@
$(AUXEXEC): $(AUXOBJ)
$(CC) $(AUXOBJ) -o $@
.c.o:
$(CC) $(CFLAGS) $(SPDUP) -c $<
debug: lookup dbg
$(CC) $(OBJECTS) -o $(EXECUTABLE)
$(CC) $(AUXOBJ) -o $(AUXEXEC)
dbg: *.c
$(CC) $(CFLAGS) $(DEBUG) -c $<
clean:
rm -f $(EXECUTABLE) $(AUXEXEC) *.o &> /dev/null
.PHONY: clean
Sat Sep 24 15:03:18 CEST 2011
Test1: test/gpl.txt (size: 35147)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 18917
2: 22014
3: 12423
4: 19015
5: 11111
6: 12647
7: 7791
8: 23498
9: 13637
10: 16032
11: 9997
12: 14059
13: 9225
14: 8609
15: 6629
16: 25610
---: 0
real 0m0.016s
user 0m0.008s
sys 0m0.016s
Loop version:
------------------------
Stats:
1: 18917
2: 22014
3: 12423
4: 19015
5: 11111
6: 12647
7: 7791
8: 23498
9: 13637
10: 16032
11: 9997
12: 14059
13: 9225
14: 8609
15: 6629
16: 25610
---: 0
real 0m0.020s
user 0m0.020s
sys 0m0.008s
Test2: test/HolyBible.txt (size: 5918239)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 3392095
2: 3970343
3: 2325421
4: 3102869
5: 1973137
6: 2177366
7: 1434363
8: 3749010
9: 2179167
10: 2751134
11: 1709076
12: 2137823
13: 1386038
14: 1466132
15: 1072405
16: 4445367
---: 3310
real 0m1.048s
user 0m1.044s
sys 0m0.012s
Loop version:
------------------------
Stats:
1: 3392095
2: 3970343
3: 2325421
4: 3102869
5: 1973137
6: 2177366
7: 1434363
8: 3749010
9: 2179167
10: 2751134
11: 1709076
12: 2137823
13: 1386038
14: 1466132
15: 1072405
16: 4445367
---: 3310
real 0m0.926s
user 0m0.924s
sys 0m0.016s
Test3: test/linux-kernel-3.0.4 (size: 434042620)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 222678565
2: 254789058
3: 137364784
4: 239010012
5: 133131414
6: 146334792
7: 83232971
8: 246531446
9: 145867949
10: 161728907
11: 103142808
12: 147836792
13: 93927370
14: 87122985
15: 66624721
16: 275921653
---: 16845505
real 2m22.900s
user 3m43.686s
sys 1m14.613s
Loop version:
------------------------
Stats:
1: 222678565
2: 254789058
3: 137364784
4: 239010012
5: 133131414
6: 146334792
7: 83232971
8: 246531446
9: 145867949
10: 161728907
11: 103142808
12: 147836792
13: 93927370
14: 87122985
15: 66624721
16: 275921653
---: 16845505
real 2m42.560s
user 3m56.011s
sys 1m26.037s
Test4: test/gpl.txt (size: 35147)
---------------------------------------------------
Look up table version:
------------------------
Stats:
3: 12423
6: 12647
12: 14059
15: 6629
32: 2338
48: 1730
69: 6676
254: 0
---: 11170
real 0m0.011s
user 0m0.004s
sys 0m0.016s
Loop version:
------------------------
Stats:
3: 12423
6: 12647
12: 14059
15: 6629
32: 2338
48: 1730
69: 6676
254: 0
---: 11170
real 0m0.021s
user 0m0.020s
sys 0m0.008s
Test5: test/HolyBible.txt (size: 5918239)
---------------------------------------------------
Look up table version:
------------------------
Stats:
3: 2325421
6: 2177366
12: 2137823
15: 1072405
32: 425404
48: 397564
69: 1251668
254: 0
---: 1781959
real 0m0.969s
user 0m0.936s
sys 0m0.048s
Loop version:
------------------------
Stats:
3: 2325421
6: 2177366
12: 2137823
15: 1072405
32: 425404
48: 397564
69: 1251668
254: 0
---: 1781959
real 0m1.447s
user 0m1.424s
sys 0m0.032s
Test6: test/linux-kernel-3.0.4 (size: 434042620)
---------------------------------------------------
Look up table version:
------------------------
Stats:
3: 137364784
6: 146334792
12: 147836792
15: 66624721
32: 99994388
48: 64451562
69: 89249942
254: 5712
---: 105210728
real 2m38.851s
user 3m37.510s
sys 1m26.653s
Loop version:
------------------------
Stats:
3: 137364784
6: 146334792
12: 147836792
15: 66624721
32: 99994388
48: 64451562
69: 89249942
254: 5712
---: 105210728
real 2m32.041s
user 3m36.022s
sys 1m27.393s
Test7: test/gpl.txt (size: 35147)
---------------------------------------------------
Look up table version:
------------------------
Stats:
3: 12423
6: 12647
12: 14059
48: 1730
---: 11277
real 0m0.013s
user 0m0.016s
sys 0m0.004s
Loop version:
------------------------
Stats:
3: 12423
6: 12647
12: 14059
48: 1730
---: 11277
real 0m0.014s
user 0m0.020s
sys 0m0.000s
Test8: test/HolyBible.txt (size: 5918239)
---------------------------------------------------
Look up table version:
------------------------
Stats:
3: 2325421
6: 2177366
12: 2137823
48: 397564
---: 1850018
real 0m0.933s
user 0m0.916s
sys 0m0.036s
Loop version:
------------------------
Stats:
3: 2325421
6: 2177366
12: 2137823
48: 397564
---: 1850018
real 0m0.892s
user 0m0.860s
sys 0m0.052s
Test9: test/linux-kernel-3.0.4 (size: 434042620)
---------------------------------------------------
Look up table version:
------------------------
Stats:
3: 137364784
6: 146334792
12: 147836792
48: 64451562
---: 132949214
real 2m31.187s
user 3m31.289s
sys 1m25.909s
Loop version:
------------------------
Stats:
3: 137364784
6: 146334792
12: 147836792
48: 64451562
---: 132949214
real 2m34.942s
user 3m33.081s
sys 1m24.381s
Test10: test/gpl.txt (size: 35147)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 18917
2: 22014
---: 6639
real 0m0.014s
user 0m0.016s
sys 0m0.008s
Loop version:
------------------------
Stats:
1: 18917
2: 22014
---: 6639
real 0m0.017s
user 0m0.016s
sys 0m0.008s
Test11: test/HolyBible.txt (size: 5918239)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 3392095
2: 3970343
---: 881222
real 0m0.861s
user 0m0.848s
sys 0m0.032s
Loop version:
------------------------
Stats:
1: 3392095
2: 3970343
---: 881222
real 0m0.781s
user 0m0.760s
sys 0m0.044s
Test12: test/linux-kernel-3.0.4 (size: 434042620)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 222678565
2: 254789058
---: 84476465
real 2m29.894s
user 3m30.449s
sys 1m23.177s
Loop version:
------------------------
Stats:
1: 222678565
2: 254789058
---: 84476465
real 2m21.103s
user 3m22.321s
sys 1m24.001s
Test13: test/gpl.txt (size: 35147)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 18917
---: 16230
real 0m0.015s
user 0m0.020s
sys 0m0.008s
Loop version:
------------------------
Stats:
1: 18917
---: 16230
real 0m0.016s
user 0m0.016s
sys 0m0.008s
Test14: test/HolyBible.txt (size: 5918239)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 3392095
---: 2526144
real 0m0.811s
user 0m0.808s
sys 0m0.024s
Loop version:
------------------------
Stats:
1: 3392095
---: 2526144
real 0m0.709s
user 0m0.688s
sys 0m0.040s
Test15: test/linux-kernel-3.0.4 (size: 434042620)
---------------------------------------------------
Look up table version:
------------------------
Stats:
1: 222678565
---: 201900739
real 2m21.510s
user 3m23.009s
sys 1m23.861s
Loop version:
------------------------
Stats:
1: 222678565
---: 201900739
real 2m22.677s
user 3m26.477s
sys 1m23.385s
Sat Sep 24 15:28:28 CEST 2011