Python 按ip地址范围筛选数据帧
我需要按ip地址范围过滤数据帧。不使用正则表达式是否可能Python 按ip地址范围筛选数据帧,python,pandas,dataframe,ip-address,Python,Pandas,Dataframe,Ip Address,我需要按ip地址范围过滤数据帧。不使用正则表达式是否可能 Ex. From 61.245.160.0 To 61.245.175.255 字符串在python中是可排序的,因此您应该能够做到: In [11]: '61.245.160.0' < '61.245.175.255' Out[11]: True 假设您具有以下DF: In [48]: df Out[48]: ip 0 61.245.160.1 1 61.245.160.100 2
Ex. From 61.245.160.0 To 61.245.175.255
字符串在python中是可排序的,因此您应该能够做到:
In [11]: '61.245.160.0' < '61.245.175.255'
Out[11]: True
假设您具有以下DF:
In [48]: df
Out[48]:
ip
0 61.245.160.1
1 61.245.160.100
2 61.245.160.200
3 61.245.160.254
让我们查找介于(但不包括)61.245.160.99和61.245.160.254之间的所有IP:
In [49]: ip_from = '61.245.160.99'
In [50]: ip_to = '61.245.160.254'
如果我们将IP作为字符串进行比较-它将按字典顺序进行比较,因此无法正常工作:
演示:
检查:
In [80]: (df.ip.apply(lambda x: int(IPAddress(x))) == ip_to_int(df.ip)).all()
Out[80]: True
我有一个方法使用
例如,我想知道host0=10.2.23.5是否属于以下任何一个网络NETS=['10.2.48.0/25'、'10.2.23.0/25'、'10.2.154.0/24']
>>> host0 = ip.IPv4Address('10.2.23.5')
>>> NETS = ['10.2.48.0/25','10.2.23.0/25','10.2.154.0/24']
>>> nets = [ip.IPv4Network(x) for x in NETS]
>>> [x for x in nets if (host2 >= x.network_address and host2 <= x.broadcast_address)]
[IPv4Network('10.2.23.0/25')]
这将创建一个新的列newCol
,其中每一行将是1
或-1
,这取决于IP地址是否属于您感兴趣的网络。@Andy Hayden:好的,使用正则表达式很好,我有很多范围需要匹配,这就是为什么我不使用正则表达式的原因?我喜欢python!最简单的解决方案通常对程序员来说是可行的!小心使用此方法,因为它不能正确排序。i、 例如,'61.245.160.99'<'61.245.160.102'Out[161]:False
@adele好的一点是,获得正确行为的快速方法是为ip地址创建一个代表性的大数字,例如,61*100000000+245*1000000+160*1000+255
,并基于此进行排序。我同意@ade1e。为什么不使用导入IP地址
?它工作得很好:[x代表网络中的x如果(host2>=x.network\u地址和host2只是想添加,我遇到了def ip\u to\u int(ip\u ser)
的问题,当我的ip是例如240.42.123.100
时,我会得到int-265651356
。这是由于np.left\u移位(ips,mults)的整数溢出造成的
。在本例中,我找到了一个解决办法,可以使用np.left\u shift(ips.astype(object),mults)
In [51]: df.query("'61.245.160.99' < ip < '61.245.160.254'")
Out[51]:
Empty DataFrame
Columns: [ip]
Index: []
In [52]: df.query('@ip_from < ip < @ip_to')
Out[52]:
Empty DataFrame
Columns: [ip]
Index: []
In [53]: df[df.ip.apply(lambda x: int(IPAddress(x)))
....: .to_frame('ip')
....: .eval('{} < ip < {}'.format(int(IPAddress(ip_from)),
....: int(IPAddress(ip_to)))
....: )
....: ]
Out[53]:
ip
1 61.245.160.100
2 61.245.160.200
In [66]: df.ip.apply(lambda x: int(IPAddress(x)))
Out[66]:
0 1039507457
1 1039507556
2 1039507656
3 1039507710
Name: ip, dtype: int64
In [67]: df.ip.apply(lambda x: int(IPAddress(x))).to_frame('ip')
Out[67]:
ip
0 1039507457
1 1039507556
2 1039507656
3 1039507710
In [68]: (df.ip.apply(lambda x: int(IPAddress(x)))
....: .to_frame('ip')
....: .eval('{} < ip < {}'.format(int(IPAddress(ip_from)),
....: int(IPAddress(ip_to))))
....: )
Out[68]:
0 False
1 True
2 True
3 False
dtype: bool
def ip_to_int(ip_ser):
ips = ip_ser.str.split('.', expand=True).astype(np.int16).values
mults = np.tile(np.array([24, 16, 8, 0]), len(ip_ser)).reshape(ips.shape)
return np.sum(np.left_shift(ips, mults), axis=1)
In [78]: df['int_ip'] = ip_to_int(df.ip)
In [79]: df
Out[79]:
ip int_ip
0 61.245.160.1 1039507457
1 61.245.160.100 1039507556
2 61.245.160.200 1039507656
3 61.245.160.254 1039507710
In [80]: (df.ip.apply(lambda x: int(IPAddress(x))) == ip_to_int(df.ip)).all()
Out[80]: True
>>> host0 = ip.IPv4Address('10.2.23.5')
>>> NETS = ['10.2.48.0/25','10.2.23.0/25','10.2.154.0/24']
>>> nets = [ip.IPv4Network(x) for x in NETS]
>>> [x for x in nets if (host2 >= x.network_address and host2 <= x.broadcast_address)]
[IPv4Network('10.2.23.0/25')]
def fnc(row):
host = ip.IPv4Address(row)
vec = [x for x in netsPy if (host >= x.network_address and host <= x.broadcast_address)]
if len(vec) == 0:
return '1'
else:
return '-1'
df['newCol'] = df['IP'].apply(fnc)