Python比较字典和条件字典
我有一个迭代器(数百万行),它给了我一个字典,需要将它与条件字典进行比较以找到匹配项 这是我的密码:Python比较字典和条件字典,python,dictionary,Python,Dictionary,我有一个迭代器(数百万行),它给了我一个字典,需要将它与条件字典进行比较以找到匹配项 这是我的密码: conditions={"port":"0-20", "ip":"1.2.3.4", "protocol":"1,7", "timestamp":">143990000", "server":"mario"} for rec in imiterator(): # Very large number of rows # rec examples {"ip":
conditions={"port":"0-20", "ip":"1.2.3.4", "protocol":"1,7",
"timestamp":">143990000", "server":"mario"}
for rec in imiterator(): # Very large number of rows
# rec examples {"ip":"1.7.1.1", "timestamp":1434000,
# "port":129,"server":("mario","bruno"),
# "protocol":"1","port":19"}
if check_conditions(rec, conditions):
print(json.dumps(rec))
请注意,rec
中的列可以是int
,long
,string
,tuple
我需要找到一种真正高性能的方法来进行比赛。有什么想法吗
我考虑使用
map
并将条件转换为应该匹配的lambda函数,并对所有条件执行and操作。这会更快吗?如果目标是检查1:1的对应关系,为什么不将rec的所有条目转换为字符串,然后执行一个
return conditions == rec
如果目标是处理和理解某个范围内的数据,那么它可能有不同的处理标准,但对于这样一项琐碎的任务,使用map只会增加开销。以下是我所做的,将我的条件转换为lambda函数字典,并使用第一个记录对象来确定我想要的函数类型-基本字符串匹配、数字匹配或范围(大于或小于0-100的范围或-100小于100且100大于100)
def检查_条件(rec,val):
matched=True#如果没有问题,默认情况下返回True
对于val中的cond:
cmatch=val[cond](记录属性(cond))
如果不是cmatch:#当前匹配为false,则只需为此记录返回false
返回错误
匹配=匹配且匹配
回报匹配
条件={“端口”:“0-20”,“ip”:“1.2.3.4”,“协议”:“1,7”,“时间戳”:“14399000-”,“服务器”:“!mario”}
def检查_条件(rec,val):
matched=True#如果没有问题,默认情况下返回True
对于val中的cond:
cmatch=val[cond](记录属性(cond))
如果不是cmatch:#当前匹配为false,则只需为此记录返回false
返回错误
匹配=匹配且匹配
回报匹配
def编号\u逻辑\u至\u lambda(因瓦、switcharoo):
"""
此子例程检查请求是否为范围查询、逗号分隔列表或单个数字
"""
z=invar.split('-')#范围查询
如果len(z)==1:
y=map(int,因瓦分裂(',)#多个条件
如果len(y)=1:#单个匹配
返回lambda x:(x==int(y[0])^switcharoo
else:#这是一个逗号分隔的转换为列表并将匹配条件发送回
返回λx:(x在y中)^switcharoo
elif len(z)=2:#这是一个带“-”的查询
如果z[1]='':#这是一个大于的查询
返回λx:(int(z[0])=x)^switcharoo
else:#这是范围查询
返回lambda x:(int(z[0])看起来OP希望能够使用一些逻辑,例如,“timestamp”:“>14399000”
很抱歉没有解释。这些条件都是逻辑条件,而不仅仅是1:1-因此示例0-20中的端口范围意味着端口可以在0到20之间。协议1,7意味着协议可以是1或7。在这种情况下,IP被视为字符串,因此只有1:1比较是好的。这不是字符串部分匹配。如果我们得到一个好的方法之后我可以考虑子网、部分字符串等。但当前的搜索要求不是这样。条件都是逻辑的,而不仅仅是1:1-因此示例0-20中的端口范围意味着端口可以在0到20之间。协议1,7意味着协议可以是1或7。您必须使用或设计DSL要表达这些条件,没有简单的答案。
def check_condition(rec,val):
matched=True # by default return true if there is no questions asked
for cond in val:
cmatch=val[cond](rec.__getattribute__(cond))
if not cmatch: # current match is false just return False for this record
return False
matched=matched and cmatch
return matched
conditions={"port":"0-20","ip":"1.2.3.4","protocol":"1,7","timestamp":"143990000-","server":"!mario"}
def check_condition(rec,val):
matched=True # by default return true if there is no questions asked
for cond in val:
cmatch=val[cond](rec.__getattribute__(cond))
if not cmatch: # current match is false just return False for this record
return False
matched=matched and cmatch
return matched
def number_logic_to_lambda(invar,switcharoo):
"""
This subroutine does check the request to be either a range query or a comma delimited list or a single number
"""
z=invar.split('-') # range queries
if len(z) == 1:
y=map(int,invar.split(',')) # multiple conditions
if len(y) == 1: # a single match
return lambda x: (x == int(y[0]))^switcharoo
else: # This is a comma delimited convert to list and send match condition back
return lambda x: (x in y)^switcharoo
elif len(z) == 2: # This is a query with "-"
if z[1] == '': # This is a greater than query
return lambda x: (int(z[0]) <= x)^switcharoo
elif x[0] == '': # This is a less than query
return lambda x: (int(z[0]) >= x)^switcharoo
else: # This is range query
return lambda x: (int(z[0]) <= x <= int(z[1]))^switcharoo
iter=imiterator()
first_rec=next(iter)
nvars={} # This is conditions changed into functions
for svar in conditions:
qvar=conditions[svar]
switcharoo=False
if qvar.startswith("!"): # Start with a bang it is a negative condition
qvar=qvar.replace("!","")
switcharoo=True
mapf=lambda x: x == qvar # default mapping function full string match
if isinstance(cattr,int): #the next three treat them as numeric
mapf=number_logic_to_lambda(qvar,switcharoo)
elif isinstance(cattr,float): # float is also treated like a number
mapf=number_logic_to_lambda(float(qvar),switcharoo)
elif isinstance(cattr,long): # long is also treated numeric
mapf=number_logic_to_lambda(qvar,switcharoo)
elif isinstance(cattr,tuple): # Tuples can use set intersection
print set(qvar.split(","))
mapf=lambda x: (set(qvar.split(",")).issubset(set(x))) ^ switcharoo
nvars[svar]=mapf # update the dictionary of mapped functions
rec=next(iter,None)
while rec: # very large number of rows
#rec examples {"ip":"1.7.1.1","timestamp":1434000,"port":129,"server":("mario","bruno"), "protocol":"1","port":19"}
if check_conditions(rec,nvars):
print json.dumps(rec)
rec=next(iter,None)