Python 将字符串和整数提取到字典中_Python_Regex

Python 将字符串和整数提取到字典中

python regex

Python 将字符串和整数提取到字典中,python,regex,Python,Regex,我试图理解如何将特定类型的字符串及其值提取到字典中例：项目衬衫11-14差异11-12-13-14-15颜色红色在哪里 ShirtType: 11, 14 variance: 11,12,13,14,15 color: Red 您会喜欢使用regexp的想法，并了解如何在Python中实现这一点吗？欢迎所有想法。您可以使用以下表达式从字符串中提取所需的值。下面是一行代码，用于获取所需的dict： >>> import re >>> my_str = "

我试图理解如何将特定类型的字符串及其值提取到字典中

例：

项目衬衫11-14差异11-12-13-14-15颜色红色

在哪里

ShirtType: 11, 14
variance: 11,12,13,14,15
color: Red

您会喜欢使用regexp的想法，并了解如何在Python中实现这一点吗？欢迎所有想法。

您可以使用以下表达式从字符串中提取所需的值。下面是一行代码，用于获取所需的

dict

：

>>> import re
>>> my_str = "item SHIRT 11-14 variance 11-12-13-14-15 color Red"
>>> keys = ["shirt", "variance", "color"]

>>> {k: v.split('-') if '-' in v else v for k, v in zip(keys, re.findall(
        '(?<=SHIRT\s)[\d-]+|(?<=variance\s)[\d-]+|(?<=color\s)\w+',my_str))}

每个正则表达式模式的解释：

# For shirt: 
#     This regex matches the number and hyphen "-" 
#     preceded by "variance" and space " "
>>> re.search('(?<=SHIRT\s)[\d-]+', my_str).group()
'11-14'       

# For variance:
#     Same as the above regex, it matches the number and hyphen "-" 
#     preceded by "SHIRT" and space " "
>>> re.search('(?<=variance\s)[\d-]+', my_str).group()
'11-12-13-14-15'

# For color:
#     This regex matches the alphabets followed by "color" and space " "
>>> re.search('(?<=color\s)\w+', my_str).group()
'Red'

#衬衫：
#此正则表达式与数字和连字符“-”匹配
#前面有“方差”和空格“”
>>>re.search（“（？您可以使用以下表达式从字符串中提取所需的值。以下是获取所需的dict
的一行代码：
>>> import re
>>> my_str = "item SHIRT 11-14 variance 11-12-13-14-15 color Red"
>>> keys = ["shirt", "variance", "color"]

>>> {k: v.split('-') if '-' in v else v for k, v in zip(keys, re.findall(
        '(?<=SHIRT\s)[\d-]+|(?<=variance\s)[\d-]+|(?<=color\s)\w+',my_str))}

每个正则表达式模式的解释：
# For shirt: 
#     This regex matches the number and hyphen "-" 
#     preceded by "variance" and space " "
>>> re.search('(?<=SHIRT\s)[\d-]+', my_str).group()
'11-14'       

# For variance:
#     Same as the above regex, it matches the number and hyphen "-" 
#     preceded by "SHIRT" and space " "
>>> re.search('(?<=variance\s)[\d-]+', my_str).group()
'11-12-13-14-15'

# For color:
#     This regex matches the alphabets followed by "color" and space " "
>>> re.search('(?<=color\s)\w+', my_str).group()
'Red'

#衬衫：
#此正则表达式与数字和连字符“-”匹配
#前面有“方差”和空格“”
>>>重新搜索（“（？您可以尝试以下操作：
import re
s = "item SHIRT 11-14 variance 11-12-13-14-15 color Red"
new_s = s.split()[1:]
final_data = {"ShirtType" if a == "SHIRT" else a:map(int, b.split('-')) if re.findall('\d\-', b) else b for a, b in [(new_s[i], new_s[i+1]) for i in range(0, len(new_s)-1, 2)]}

输出：
{'color': 'Red', 'ShirtType': [11, 14], 'variance': [11, 12, 13, 14, 15]}

您可以尝试以下方法：
import re
s = "item SHIRT 11-14 variance 11-12-13-14-15 color Red"
new_s = s.split()[1:]
final_data = {"ShirtType" if a == "SHIRT" else a:map(int, b.split('-')) if re.findall('\d\-', b) else b for a, b in [(new_s[i], new_s[i+1]) for i in range(0, len(new_s)-1, 2)]}

输出：
{'color': 'Red', 'ShirtType': [11, 14], 'variance': [11, 12, 13, 14, 15]}

如果您的输入总是这样，您可以使用regex
提取值并将其插入字典：
import re

dic = {}
input = 'item SHIRT 11-14 variance 11-12-13-14-15 color Red'
dic['Shirt Type'] = re.search('(?<=SHIRT\s)[\d-]+', input).group().split('-')
dic['Variance'] = re.search('(?<=variance\s)[\d-]+', input).group().split('-')
dic['Color']= re.search('(?<=color\s)\w+', input).group().split('-')
print(dic)

如果您的输入总是这样，您可以使用regex
提取值并将其插入字典：
import re

dic = {}
input = 'item SHIRT 11-14 variance 11-12-13-14-15 color Red'
dic['Shirt Type'] = re.search('(?<=SHIRT\s)[\d-]+', input).group().split('-')
dic['Variance'] = re.search('(?<=variance\s)[\d-]+', input).group().split('-')
dic['Color']= re.search('(?<=color\s)\w+', input).group().split('-')
print(dic)

您也可以不使用正则表达式进行尝试：
单线解决方案：
print({line.split()[1:][i:i+2][0]:line.split()[1:][i:i+2][1] for line in open('file.txt','r') for i in range(0,len(line.split()[1:]),2)})

输出：
{'color': 'Red', 'variance': '11-12-13-14-15', 'SHIRT': '11-14'}

详细版本：
with open('file.txt','r') as f:
    for line in f:
        chunk=line.split()[1:]
        print({chunk[i:i+2][0]:chunk[i:i+2][1] for i in range(0,len(chunk),2)})

您也可以不使用正则表达式进行尝试：
单线解决方案：
print({line.split()[1:][i:i+2][0]:line.split()[1:][i:i+2][1] for line in open('file.txt','r') for i in range(0,len(line.split()[1:]),2)})

输出：
{'color': 'Red', 'variance': '11-12-13-14-15', 'SHIRT': '11-14'}

详细版本：
with open('file.txt','r') as f:
    for line in f:
        chunk=line.split()[1:]
        print({chunk[i:i+2][0]:chunk[i:i+2][1] for i in range(0,len(chunk),2)})

仅对版本特定的问题使用python2/python3标记。添加以获得所需结果的项目衬衫
，差异
，颜色
部分的文本是否始终是静态的？对于版本特定的问题，请使用python2/python3标记。对于项目衬衫
，差异
，c，文本是否始终是静态的为了得到你想要的结果，我尝试了dic['Color']=re.search（'（？删除split（'-'））
所以它会返回一个stringgreat，让它工作起来，实际上我在使用输入，应该用声明的字符串本身来替换。我尝试了dic['Color']=re.search（'（？删除split（'-'））
所以它返回了一个stringgreat，让它工作了，实际上我使用的是输入，应该用声明的字符串本身替换。