Python 使用正则表达式将多行脚本输出转换为字典
我得到了以下脚本输出:Python 使用正则表达式将多行脚本输出转换为字典,python,regex,dictionary,Python,Regex,Dictionary,我得到了以下脚本输出: *************************************************** [g4u2680c]: searching for domains --------------------------------------------------- host = g4u2680c.houston.example.com ipaddr = [16.208.16.72] VLAN = [352]
***************************************************
[g4u2680c]: searching for domains
---------------------------------------------------
host = g4u2680c.houston.example.com
ipaddr = [16.208.16.72]
VLAN = [352]
Gateway= [16.208.16.1]
Subnet = [255.255.248.0]
Subnet = [255.255.248.0]
Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]
host = g4u2680c.houston.example.com
ipaddr = [16.208.16.72]
VLAN = [352]
Gateway= [16.208.16.1]
Subnet = [255.255.248.0]
Subnet = [255.255.248.0]
Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]
* script completed Mon Jun 15 06:13:14 UTC 2015 **
* sleeping 30 to avoid DOS on dns via a loop **
我需要将2主机列表提取到字典中,不带括号 这是我的密码:
#!/bin/env python
import re
text="""***************************************************
[g4u2680c]: searching for domains
---------------------------------------------------
host = g4u2680c.houston.example.com
ipaddr = [16.208.16.72]
VLAN = [352]
Gateway= [16.208.16.1]
Subnet = [255.255.248.0]
Subnet = [255.255.248.0]
Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]
host = g4u2680c.houston.example.com
ipaddr = [16.208.16.72]
VLAN = [352]
Gateway= [16.208.16.1]
Subnet = [255.255.248.0]
Subnet = [255.255.248.0]
Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]
* script completed Mon Jun 15 06:13:14 UTC 2015 **
* sleeping 30 to avoid DOS on dns via a loop **
***************************************************
"""
seq = re.compile(r"host.+?\n\n",re.DOTALL)
a=seq.findall(text)
matches = re.findall(r'\w.+=.+', a[0])
matches = [m.split('=', 1) for m in matches]
matches = [ [m[0].strip().lower(), m[1].strip().lower()] for m in matches]
#should have function with regular expression to remove bracket here
d = dict(matches)
print d
到目前为止,我为第一位主持人获得了什么:
{'subnet': '[255.255.248.0]', 'vlan': '[352]', 'ipaddr': '[16.208.16.72]', 'cluster': '[g4u2679c g4u2680c g9u1484c g9u1485c]', 'host': 'g4u2680c.houston.example.com', 'gateway': '[16.208.16.1]'}
我需要帮助找到正则表达式以删除括号,因为字典中的值包含带括号和不带括号的数据
或者,如果有更好更简单的方法将原始脚本输出转换为字典。您只需使用
re.findall
和dict
:
>>> dict([(i,j.strip('[]')) for i,j in re.findall(r'(\w+)\s*=\s*(.+)',text)])
{'Subnet': '255.255.248.0', 'VLAN': '352', 'ipaddr': '16.208.16.72', 'Cluster': 'g4u2679c g4u2680c g9u1484c g9u1485c', 'host': 'g4u2680c.houston.example.com', 'Gateway': '16.208.16.1'}
关于方括号,您可以通过str.strip
方法删除它们。您可以尝试一下
matches = [m.replace('[','').replace(']','').split('=', 1) for m in matches]
您可以使用:
(\w+)\s*=\s*\[?([^\n\]]+)\]?
您的解决方案与主机名不匹配,因为主机名没有括号。@SharuzzamanAhmatRaslan如果您也需要主机名,您可以循环使用
re.findall()
并用str.strip
删除括号,我希望可以接受多个答案,因为您的答案很有趣,谢谢。如果这对你有帮助,那么接受这个答案。
import re
p = re.compile(ur'(\w+)\s*=\s*\[?([^\n\]]+)\]?', re.MULTILINE)
test_str = u"host = g4u2680c.houston.example.com\n ipaddr = [16.208.16.72]\n VLAN = [352]\n Gateway= [16.208.16.1]\n Subnet = [255.255.248.0]\n Subnet = [255.255.248.0]\n Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]\n\nhost = g4u2680c.houston.example.com\n ipaddr = [16.208.16.72]\n VLAN = [352]\n Gateway= [16.208.16.1]\n Subnet = [255.255.248.0]\n Subnet = [255.255.248.0]\n Cluster= [g4u2679c g4u2680c g9u1484c g9u1485c]\n"
re.findall(p, test_str)