Python 通过正则表达式捕获组,然后将捕获组拆分为单独的列表项
我已在列表中添加了阅读行:Python 通过正则表达式捕获组,然后将捕获组拆分为单独的列表项,python,regex,split,Python,Regex,Split,我已在列表中添加了阅读行: l = ['W –-Transportation', 'W23.F5-International_waterways W25.2-Airlines', 'W23.F8-Rivers W25.4-Bus_lines', 'W23.H-Pipelines
l = ['W –-Transportation',
'W23.F5-International_waterways W25.2-Airlines',
'W23.F8-Rivers W25.4-Bus_lines',
'W23.H-Pipelines W25.6-Railroads',
'W23.H2-Oil_pipelines W25.8-Shipping_lines',
'W23.H4-Natural_gas_pipelines W27-Transportation_safety',
'W23.H6-Water_pipelines W27.2-Traffic_safety',
'W23.K-Transportation_system_design W29-Navigation',
'W23.M-Transportation_system_construction W32-Transportation_research',
'W23.M2-Transportation_facility_construction W32.2-Transportation_surveys',
'W23.M4-Transportation_system_maintenance W34-Transportation_education',
'W23.M4.2-Road_maintenance W36-Transportation_policy',
'W23.M6-Transportation_system_repair W38-Transportation_planning',
'W23.M6.2-Vehicle_repair W40-Transportation_aspects',
'W25-Transportation_industry']
现在,对于每一行,我想捕获两个组,例如-W23.F5-国际水道和W25.2-航空公司,并将它们分成两个列表条目
我的预期结果是:
l = ['W –-Transportation','W23.F5-International_waterways','W25.2-Airlines','W23.F8-Rivers','W25.4-Bus_lines','W23.H-Pipelines','W25.6-Railroads','W23.H2-Oil_pipelines','W25.8-Shipping_lines', .....,'W25-Transportation_industry']
捕获组的正则表达式应该是([a-z])\s*?([a-z])
,但是我应该如何将捕获组拆分为新的列表项呢?也许,在上进行简单拆分。“
在这里可能就行了:
import re
l = ['W –-Transportation',
'W23.F5-International_waterways W25.2-Airlines',
'W23.F8-Rivers W25.4-Bus_lines',
'W23.H-Pipelines W25.6-Railroads',
'W23.H2-Oil_pipelines W25.8-Shipping_lines',
'W23.H4-Natural_gas_pipelines W27-Transportation_safety',
'W23.H6-Water_pipelines W27.2-Traffic_safety',
'W23.K-Transportation_system_design W29-Navigation',
'W23.M-Transportation_system_construction W32-Transportation_research',
'W23.M2-Transportation_facility_construction W32.2-Transportation_surveys',
'W23.M4-Transportation_system_maintenance W34-Transportation_education',
'W23.M4.2-Road_maintenance W36-Transportation_policy',
'W23.M6-Transportation_system_repair W38-Transportation_planning',
'W23.M6.2-Vehicle_repair W40-Transportation_aspects',
'W25-Transportation_industry']
k = []
for i in l:
new_string = i.split(" ")
for j in new_string:
if j != '':
k.append(j.strip())
print(k)
输出
也许,在“
上进行简单的拆分可以在这里正常工作:
import re
l = ['W –-Transportation',
'W23.F5-International_waterways W25.2-Airlines',
'W23.F8-Rivers W25.4-Bus_lines',
'W23.H-Pipelines W25.6-Railroads',
'W23.H2-Oil_pipelines W25.8-Shipping_lines',
'W23.H4-Natural_gas_pipelines W27-Transportation_safety',
'W23.H6-Water_pipelines W27.2-Traffic_safety',
'W23.K-Transportation_system_design W29-Navigation',
'W23.M-Transportation_system_construction W32-Transportation_research',
'W23.M2-Transportation_facility_construction W32.2-Transportation_surveys',
'W23.M4-Transportation_system_maintenance W34-Transportation_education',
'W23.M4.2-Road_maintenance W36-Transportation_policy',
'W23.M6-Transportation_system_repair W38-Transportation_planning',
'W23.M6.2-Vehicle_repair W40-Transportation_aspects',
'W25-Transportation_industry']
k = []
for i in l:
new_string = i.split(" ")
for j in new_string:
if j != '':
k.append(j.strip())
print(k)
输出
发布预期结果关于'W–-Transportation'
或'W25-Transportation\u industry'
之类的项目,您有什么看法?通常,您只需要使用\s+
拆分每个项目,因为每个值只包含非空白。添加了预期结果。显示的正则表达式不会捕获这些字符串。是否有理由使用正则表达式和组来实现此目的?只在每一行上使用.split()
可能会简单得多。发布预期结果关于'W–-Transportation'
或'W25-Transportation\u industry'
之类的项目是什么?通常,您只需要使用\s+
拆分每个项目,因为每个值只包含非空白。添加了预期结果。显示的正则表达式不会捕获这些字符串。是否有理由使用正则表达式和组来实现此目的?在每一行上使用.split()
可能会简单得多。