Python比较字典和条件字典

Python比较字典和条件字典,python,dictionary,Python,Dictionary,我有一个迭代器(数百万行),它给了我一个字典,需要将它与条件字典进行比较以找到匹配项 这是我的密码: conditions={"port":"0-20", "ip":"1.2.3.4", "protocol":"1,7", "timestamp":">143990000", "server":"mario"} for rec in imiterator(): # Very large number of rows # rec examples {"ip":

我有一个迭代器(数百万行),它给了我一个字典,需要将它与条件字典进行比较以找到匹配项

这是我的密码:

conditions={"port":"0-20", "ip":"1.2.3.4", "protocol":"1,7",
            "timestamp":">143990000", "server":"mario"}

for rec in imiterator(): # Very large number of rows
    # rec examples {"ip":"1.7.1.1", "timestamp":1434000,
    #              "port":129,"server":("mario","bruno"), 
    #              "protocol":"1","port":19"}

    if check_conditions(rec, conditions):
       print(json.dumps(rec))
请注意,
rec
中的列可以是
int
long
string
tuple

我需要找到一种真正高性能的方法来进行比赛。有什么想法吗


我考虑使用
map
并将条件转换为应该匹配的lambda函数,并对所有条件执行and操作。这会更快吗?

如果目标是检查1:1的对应关系,为什么不将rec的所有条目转换为字符串,然后执行一个

return conditions == rec

如果目标是处理和理解某个范围内的数据,那么它可能有不同的处理标准,但对于这样一项琐碎的任务,使用map只会增加开销。

以下是我所做的,将我的条件转换为lambda函数字典,并使用第一个记录对象来确定我想要的函数类型-基本字符串匹配、数字匹配或范围(大于或小于0-100的范围或-100小于100且100大于100)

def检查_条件(rec,val):
matched=True#如果没有问题,默认情况下返回True
对于val中的cond:
cmatch=val[cond](记录属性(cond))
如果不是cmatch:#当前匹配为false,则只需为此记录返回false
返回错误
匹配=匹配且匹配
回报匹配
条件={“端口”:“0-20”,“ip”:“1.2.3.4”,“协议”:“1,7”,“时间戳”:“14399000-”,“服务器”:“!mario”}
def检查_条件(rec,val):
matched=True#如果没有问题,默认情况下返回True
对于val中的cond:
cmatch=val[cond](记录属性(cond))
如果不是cmatch:#当前匹配为false,则只需为此记录返回false
返回错误
匹配=匹配且匹配
回报匹配
def编号\u逻辑\u至\u lambda(因瓦、switcharoo):
"""
此子例程检查请求是否为范围查询、逗号分隔列表或单个数字
"""
z=invar.split('-')#范围查询
如果len(z)==1:
y=map(int,因瓦分裂(',)#多个条件
如果len(y)=1:#单个匹配
返回lambda x:(x==int(y[0])^switcharoo
else:#这是一个逗号分隔的转换为列表并将匹配条件发送回
返回λx:(x在y中)^switcharoo
elif len(z)=2:#这是一个带“-”的查询
如果z[1]='':#这是一个大于的查询
返回λx:(int(z[0])=x)^switcharoo
else:#这是范围查询

返回lambda x:(int(z[0])看起来OP希望能够使用一些逻辑,例如,
“timestamp”:“>14399000”
很抱歉没有解释。这些条件都是逻辑条件,而不仅仅是1:1-因此示例0-20中的端口范围意味着端口可以在0到20之间。协议1,7意味着协议可以是1或7。在这种情况下,IP被视为字符串,因此只有1:1比较是好的。这不是字符串部分匹配。如果我们得到一个好的方法之后我可以考虑子网、部分字符串等。但当前的搜索要求不是这样。条件都是逻辑的,而不仅仅是1:1-因此示例0-20中的端口范围意味着端口可以在0到20之间。协议1,7意味着协议可以是1或7。您必须使用或设计DSL要表达这些条件,没有简单的答案。
def check_condition(rec,val):
    matched=True # by default return true if there is no questions asked
    for cond in val:
        cmatch=val[cond](rec.__getattribute__(cond))
        if not cmatch: # current match is false just return False for this record
            return False
        matched=matched and cmatch
    return matched



conditions={"port":"0-20","ip":"1.2.3.4","protocol":"1,7","timestamp":"143990000-","server":"!mario"}
def check_condition(rec,val):
    matched=True # by default return true if there is no questions asked
    for cond in val:
        cmatch=val[cond](rec.__getattribute__(cond))
        if not cmatch: # current match is false just return False for this record
            return False
        matched=matched and cmatch
    return matched
def number_logic_to_lambda(invar,switcharoo):
        """
    This subroutine does check the request to be either a range query or a comma delimited list or a single number
    """
        z=invar.split('-') # range queries 
        if len(z) == 1:
            y=map(int,invar.split(',')) # multiple conditions
            if len(y) == 1: # a single match
                return lambda x: (x == int(y[0]))^switcharoo
            else: # This is a comma delimited convert to list and send match condition back
                return lambda x: (x in y)^switcharoo
        elif len(z) == 2: # This is a query with "-" 
            if z[1] == '': # This is a greater than query
                return lambda x: (int(z[0]) <= x)^switcharoo
            elif x[0] == '': # This is a less than query
                return lambda x: (int(z[0]) >= x)^switcharoo
            else: # This is range query
                return lambda x: (int(z[0]) <= x <= int(z[1]))^switcharoo
    iter=imiterator()
 first_rec=next(iter)
 nvars={} # This is conditions changed into functions
 for svar in conditions:
    qvar=conditions[svar]
    switcharoo=False
    if qvar.startswith("!"): # Start with a bang it is a negative condition
       qvar=qvar.replace("!","")
       switcharoo=True   
       mapf=lambda x: x == qvar # default mapping function full string match
       if isinstance(cattr,int): #the next three treat them as numeric
           mapf=number_logic_to_lambda(qvar,switcharoo)
       elif isinstance(cattr,float): # float is also treated like a number
           mapf=number_logic_to_lambda(float(qvar),switcharoo)
       elif isinstance(cattr,long):  # long is also treated numeric
           mapf=number_logic_to_lambda(qvar,switcharoo)
       elif isinstance(cattr,tuple): # Tuples can use set intersection
           print set(qvar.split(","))
           mapf=lambda x: (set(qvar.split(",")).issubset(set(x))) ^ switcharoo
       nvars[svar]=mapf # update the dictionary of mapped functions

 rec=next(iter,None)
 while rec: # very large number of rows
     #rec examples {"ip":"1.7.1.1","timestamp":1434000,"port":129,"server":("mario","bruno"), "protocol":"1","port":19"}
      if check_conditions(rec,nvars):
          print json.dumps(rec)
      rec=next(iter,None)