使用正则表达式从python中的字符串中提取多个值_Python_Regex_String

使用正则表达式从python中的字符串中提取多个值

python regex string

使用正则表达式从python中的字符串中提取多个值,python,regex,string,Python,Regex,String,我有多个字符串，看起来像这样产品：青苹果价格：2.0国家：法国公司：somecompany。某些字符串的字段可能较少。例如，一些缺少公司名称或国家等。我试图只提取值，跳过产品、价格、国家、公司。我尝试创建多个正则表达式，从每个字符串的左侧开始 blah="product: green apples price: 2.0 country: france company: somecompany" product_reg = re.compile(r'.*?\bproduct\b:(.*).*'

我有多个字符串，看起来像这样

产品：青苹果价格：2.0国家：法国公司：somecompany

。某些字符串的字段可能较少。例如，一些缺少公司名称或国家等。我试图只提取值，跳过产品、价格、国家、公司。我尝试创建多个正则表达式，从每个字符串的左侧开始

blah="product: green apples price: 2.0 country: france company: somecompany"

product_reg = re.compile(r'.*?\bproduct\b:(.*).*')
product_reg_strip = re.compile(r'(.*?)\s[a-z]:?')

product_full=re.findall(product_reg, blah)
prod=re.find(product_reg_strip, str(product_full))
print prod

price_reg = re.compile(r'.*?\bprice\b:(.*).*')
price_reg_strip = re.compile(r'(.*?)\s[a-z]:?')

price_full=re.findall(price_reg, blah)
price=re.find(price_reg_strip, str(price_full))
print price

但这是行不通的。我该怎么做才能使这个正则表达式更加合理？

您可以这样拆分字符串：

str = "product: green apples price: 2.0 country: france company: somecompany"
p = re.compile(r'(\w+:)')
res = p.split(str)
print res
for i in range(len(res)):
    if (i%2):
        print res[i],' ==> ',res[i+1]

输出：

['', 'product:', ' green apples ', 'price:', ' 2.0 ', 'country:', ' france ', 'company:', ' somecompany']

product:  ==>   green apples 
price:  ==>   2.0 
country:  ==>   france 
company:  ==>   somecompany

我不完全确定你想要的是什么，但是如果你想删除的东西是一个单词后跟一个冒号，正则表达式是非常简单的。这里有一些样品

>>> import re
>>> blah="product: green apples price: 2.0 country: france company: somecompany"
>>> re.sub(r'\w+: ?', '', blah)
'green apples 2.0 france somecompany'
>>> re.split(r'\w+: ?', blah)[1:]
['green apples ', '2.0 ', 'france ', 'somecompany']

您可以简单地使用regexp并获得命名的组结果。您也可以按照要求拥有或不拥有所有值，regexp在所有情况下都可以正常工作。尝试在regex101.com上使用此全局多行regexp：

最后，您只需检查匹配是否令人满意，并修剪空间，如：

if matches1.group('product') is not None:
  product = matches.group('product').strip()

price是每个字符串中唯一的数值吗？您希望输出是什么？在您的示例中，它是否是绿苹果2.0法国某公司？

pattern = '^(?:product:(?P<product>.*?))(?:price:(?P<price>.*?))?(?:country:(?P<country>.*?))?(?:company:(?P<company>.*))?$'
matches = re.search(pattern, 'product: green apples price: 2.0 country: italy company: italian company')

product = matches.group('product')

if matches1.group('product') is not None:
  product = matches.group('product').strip()