Python比较字典和条件字典_Python_Dictionary

Python比较字典和条件字典

python dictionary

Python比较字典和条件字典,python,dictionary,Python,Dictionary,我有一个迭代器（数百万行），它给了我一个字典，需要将它与条件字典进行比较以找到匹配项这是我的密码： conditions={"port":"0-20", "ip":"1.2.3.4", "protocol":"1,7", "timestamp":">143990000", "server":"mario"} for rec in imiterator(): # Very large number of rows # rec examples {"ip":

我有一个迭代器（数百万行），它给了我一个字典，需要将它与条件字典进行比较以找到匹配项

这是我的密码：

conditions={"port":"0-20", "ip":"1.2.3.4", "protocol":"1,7",
            "timestamp":">143990000", "server":"mario"}

for rec in imiterator(): # Very large number of rows
    # rec examples {"ip":"1.7.1.1", "timestamp":1434000,
    #              "port":129,"server":("mario","bruno"), 
    #              "protocol":"1","port":19"}

    if check_conditions(rec, conditions):
       print(json.dumps(rec))

请注意，

rec

中的列可以是

int

，

long

，

string

，

tuple

我需要找到一种真正高性能的方法来进行比赛。有什么想法吗

我考虑使用

map

并将条件转换为应该匹配的lambda函数，并对所有条件执行and操作。这会更快吗？

如果目标是检查1:1的对应关系，为什么不将rec的所有条目转换为字符串，然后执行一个

return conditions == rec

如果目标是处理和理解某个范围内的数据，那么它可能有不同的处理标准，但对于这样一项琐碎的任务，使用map只会增加开销。

以下是我所做的，将我的条件转换为lambda函数字典，并使用第一个记录对象来确定我想要的函数类型-基本字符串匹配、数字匹配或范围（大于或小于0-100的范围或-100小于100且100大于100）

def检查_条件（rec，val）：
matched=True#如果没有问题，默认情况下返回True
对于val中的cond：
cmatch=val[cond]（记录属性（cond））
如果不是cmatch:#当前匹配为false，则只需为此记录返回false
返回错误
匹配=匹配且匹配
回报匹配
条件={“端口”：“0-20”，“ip”：“1.2.3.4”，“协议”：“1,7”，“时间戳”：“14399000-”，“服务器”：“！mario”}
def检查_条件（rec，val）：
matched=True#如果没有问题，默认情况下返回True
对于val中的cond：
cmatch=val[cond]（记录属性（cond））
如果不是cmatch:#当前匹配为false，则只需为此记录返回false
返回错误
匹配=匹配且匹配
回报匹配
def编号\u逻辑\u至\u lambda（因瓦、switcharoo）：
"""
此子例程检查请求是否为范围查询、逗号分隔列表或单个数字
"""
z=invar.split（'-'）#范围查询
如果len（z）==1：
y=map（int，因瓦分裂（'，）#多个条件
如果len（y）=1:#单个匹配
返回lambda x:（x==int（y[0]）^switcharoo
else:#这是一个逗号分隔的转换为列表并将匹配条件发送回
返回λx：（x在y中）^switcharoo
elif len（z）=2:#这是一个带“-”的查询
如果z[1]=''：#这是一个大于的查询
返回λx：（int（z[0]）=x）^switcharoo
else:#这是范围查询
返回lambda x:（int（z[0]）看起来OP希望能够使用一些逻辑，例如，“timestamp”：“>14399000”很抱歉没有解释。这些条件都是逻辑条件，而不仅仅是1:1-因此示例0-20中的端口范围意味着端口可以在0到20之间。协议1,7意味着协议可以是1或7。在这种情况下，IP被视为字符串，因此只有1:1比较是好的。这不是字符串部分匹配。如果我们得到一个好的方法之后我可以考虑子网、部分字符串等。但当前的搜索要求不是这样。条件都是逻辑的，而不仅仅是1:1-因此示例0-20中的端口范围意味着端口可以在0到20之间。协议1,7意味着协议可以是1或7。您必须使用或设计DSL要表达这些条件，没有简单的答案。
def check_condition(rec,val):
    matched=True # by default return true if there is no questions asked
    for cond in val:
        cmatch=val[cond](rec.__getattribute__(cond))
        if not cmatch: # current match is false just return False for this record
            return False
        matched=matched and cmatch
    return matched



conditions={"port":"0-20","ip":"1.2.3.4","protocol":"1,7","timestamp":"143990000-","server":"!mario"}
def check_condition(rec,val):
    matched=True # by default return true if there is no questions asked
    for cond in val:
        cmatch=val[cond](rec.__getattribute__(cond))
        if not cmatch: # current match is false just return False for this record
            return False
        matched=matched and cmatch
    return matched
def number_logic_to_lambda(invar,switcharoo):
        """
    This subroutine does check the request to be either a range query or a comma delimited list or a single number
    """
        z=invar.split('-') # range queries 
        if len(z) == 1:
            y=map(int,invar.split(',')) # multiple conditions
            if len(y) == 1: # a single match
                return lambda x: (x == int(y[0]))^switcharoo
            else: # This is a comma delimited convert to list and send match condition back
                return lambda x: (x in y)^switcharoo
        elif len(z) == 2: # This is a query with "-" 
            if z[1] == '': # This is a greater than query
                return lambda x: (int(z[0]) <= x)^switcharoo
            elif x[0] == '': # This is a less than query
                return lambda x: (int(z[0]) >= x)^switcharoo
            else: # This is range query
                return lambda x: (int(z[0]) <= x <= int(z[1]))^switcharoo
    iter=imiterator()
 first_rec=next(iter)
 nvars={} # This is conditions changed into functions
 for svar in conditions:
    qvar=conditions[svar]
    switcharoo=False
    if qvar.startswith("!"): # Start with a bang it is a negative condition
       qvar=qvar.replace("!","")
       switcharoo=True   
       mapf=lambda x: x == qvar # default mapping function full string match
       if isinstance(cattr,int): #the next three treat them as numeric
           mapf=number_logic_to_lambda(qvar,switcharoo)
       elif isinstance(cattr,float): # float is also treated like a number
           mapf=number_logic_to_lambda(float(qvar),switcharoo)
       elif isinstance(cattr,long):  # long is also treated numeric
           mapf=number_logic_to_lambda(qvar,switcharoo)
       elif isinstance(cattr,tuple): # Tuples can use set intersection
           print set(qvar.split(","))
           mapf=lambda x: (set(qvar.split(",")).issubset(set(x))) ^ switcharoo
       nvars[svar]=mapf # update the dictionary of mapped functions

 rec=next(iter,None)
 while rec: # very large number of rows
     #rec examples {"ip":"1.7.1.1","timestamp":1434000,"port":129,"server":("mario","bruno"), "protocol":"1","port":19"}
      if check_conditions(rec,nvars):
          print json.dumps(rec)
      rec=next(iter,None)