Python 2.7 除臭python字典中的ip地址
我需要解析一个包含纯文本的文件,并提取有效的ip地址和模糊的ip地址 即192.168.1[.]1或192.168.1.1或192.168.1[dot]1或192.168.1dot1或192。168 . 1.一, 提取数据后,我需要将它们全部转换为有效格式并删除重复项 我当前的代码将ip地址放入一个字符串中,该字符串应该是dict?我知道我需要使用某种递归来设置键值,但我觉得有一种更有效、更模块化的方法来完成这项任务Python 2.7 除臭python字典中的ip地址,python-2.7,Python 2.7,我需要解析一个包含纯文本的文件,并提取有效的ip地址和模糊的ip地址 即192.168.1[.]1或192.168.1.1或192.168.1[dot]1或192.168.1dot1或192。168 . 1.一, 提取数据后,我需要将它们全部转换为有效格式并删除重复项 我当前的代码将ip地址放入一个字符串中,该字符串应该是dict?我知道我需要使用某种递归来设置键值,但我觉得有一种更有效、更模块化的方法来完成这项任务 import json, ordereddict, re # define
import json, ordereddict, re
# define the pattern of valid and obfuscated ips
pattern = r"((([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])[ (\[]?(\.|dot)[ )\]]?){3}([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5]))"
# open data file that contains ip addresses and other text
with open ("sample.txt", "r") as myfile:
text=myfile.read().replace('\n', '')
# put non normalized ip addresses in a dictionary
ips = {"data": [{"key1": match[0] for match in re.findall(pattern, text) }]}
# normalized ip addresses
for name, datalist in ips.iteritems():
for datadict in datalist:
for key, value in datadict.items():
if value == "(dot)":
datadict[key] = "."
if value == "[dot]":
datadict[key] = "."
if value == " . ":
datadict[key] = "."
if value == " .":
datadict[key] = "."
if value == ". ":
datadict[key] = "."
# write valid ip address to json file
with open('test.json', 'w') as outfile:
json.dump(ips, outfile)
样本数据文件
These are valid ip addresses 192.168.1.1, 8.8.8.8
These are obfuscated 192.168.2[.]1 or 192.168.3(.)1 or 192.168.1[dot]1
192.168.1[dot]1 or 192.168.1(dot)1 or 192 .168 .1 .1 or 192. 168. 1. 1. or 192 . 168 . 1 . 1
This is what an invalid ip address looks like, they should be excluded 256.1.1.1 or 500.1.500.1 or 192.168.4.0
预期结果
192.168.1.1, 192.168.2.1, 192.168.3.1 , 8.8.8.8
仅供参考192.168.4.0是一个完全有效的ip地址。正确,我只需要实际的主机,或者1-254 192.168.4.0是有效的主机地址: