Pandas 将IP地址与IP网络匹配并返回相关列
我有两个熊猫数据帧Pandas 将IP地址与IP网络匹配并返回相关列,pandas,ip-address,cidr,Pandas,Ip Address,Cidr,我有两个熊猫数据帧 import pandas as pd inp1 = [{'network':'1.0.0.0/24', 'A':1, 'B':2}, {'network':'5.46.8.0/23', 'A':3, 'B':4}, {'network':'78.212.13.0/24', 'A':5, 'B':6}] df1 = pd.DataFrame(inp) print("df1", df1) inp2 = [{'ip':'1.0.0.10'}, {'ip':'blahblahbl
import pandas as pd
inp1 = [{'network':'1.0.0.0/24', 'A':1, 'B':2}, {'network':'5.46.8.0/23', 'A':3, 'B':4}, {'network':'78.212.13.0/24', 'A':5, 'B':6}]
df1 = pd.DataFrame(inp)
print("df1", df1)
inp2 = [{'ip':'1.0.0.10'}, {'ip':'blahblahblah'}, {'ip':'78.212.13.249'}]
df2 = pd.DataFrame(inp2)
print("df2", df2)
输出:
network A B
0 1.0.0.0/24 1 2
1 5.46.8.0/23 3 4
2 78.212.13.0/24 5 6
ip
0 1.0.0.10
1 blahblahblah
2 78.212.13.249
我想要的最终输出如下所示:
ip A B
0 1.0.0.10 1 2
1 blahblahblah NaN Nan
2 78.212.13.249 5 6
我想遍历df2['ip']
中的每个单元格,并检查它是否属于df1['network']
中的网络。如果它属于网络,它将返回特定ip地址对应的a和B列。我曾参考并考虑过netaddr、IPNetwork、IPAddress、IPAddress
,但无法完全理解
谢谢你的帮助 您可以使用
netaddr
+apply()。以下是一个例子:
from netaddr import IPNetwork, IPAddress, AddrFormatError
network_df = pd.DataFrame([
{'network': '1.0.0.0/24', 'A': 1, 'B': 2},
{'network': '5.46.8.0/23', 'A': 3, 'B': 4},
{'network': '78.212.13.0/24', 'A': 5, 'B': 6}
])
ip_df = pd.DataFrame([{'ip': '1.0.0.10'}, {'ip': 'blahblahblah'}, {'ip': '78.212.13.249'}])
# create all networks using netaddr
networks = (IPNetwork(n) for n in network_df.network.to_list())
def find_network(ip):
# return empty string when bad/wrong IP
try:
ip_address = IPAddress(ip)
except AddrFormatError:
return ''
# return network name as string if we found network
for network in networks:
if ip_address in network:
return str(network.cidr)
return ''
# add network column. set network names by ip column
ip_df['network'] = ip_df['ip'].apply(find_network)
# just merge by network columns(str in both dataframes)
result = pd.merge(ip_df, network_df, how='left', on='network')
# you don't need network column in expected output...
result = result.drop(columns=['network'])
print(result)
# ip A B
# 0 1.0.0.10 1.0 2.0
# 1 blahblahblah NaN NaN
# 2 78.212.13.249 5.0 6.0
见评论。希望这有帮助。如果您愿意使用R而不是Python,我已经编写了一个包来解决这个问题。还有一个基础循环,但是它是用C++实现的(更快)!
库(TIBLE)
库(IP地址)
库(模糊连接)
地址警告:第2行出现问题:blahblahblah
nets#A tibble:3 x 4
#>地址网络A B
#>
#> 1 1.0.0.10 1.0.0.0/24 1 2
#>2娜娜娜娜娜娜
#> 3 78.212.13.249 78.212.13.0/24 5 6
由(v0.3.0)df1于2020-09-02创建。合并(df2,左
?@QuangHoang请注意,'network'和'ip'列不相同。以下是一个链接,提供更多说明: