如何在Python中解析字符串并从中获取不同的数据类型？_Python

如何在Python中解析字符串并从中获取不同的数据类型？

python

如何在Python中解析字符串并从中获取不同的数据类型？,python,Python,我有一个字符串，它是以下表单列表的一部分：- [ '<Item Name1 (String with alphanumeric characters)> <Quantity1 (Int)> x <currency string> <Price1 (Float)>', '<Item Name2 (String with alphanumeric characters)> <Quantity2 (Int)> x <curr

我有一个字符串，它是以下表单列表的一部分：-

[
'<Item Name1 (String with alphanumeric characters)> <Quantity1 (Int)> x <currency string> <Price1 (Float)>',
'<Item Name2 (String with alphanumeric characters)> <Quantity2 (Int)> x <currency string> <Price2 (Float)>',
'<Item Name3 (String with alphanumeric characters)> <Quantity3 (Int)> x <currency string> <Price3 (Float)>',
...]

样本输出列表

[
  {
    "name" : "Bananas Bunch", "quantity" : 1, "price": 3.99
  },
  {
    "name" : "Apples", "quantity" : 5, "price": 5.00
  }....
]

正则表达式（我认为）是实现这一点最有效的方法。但是，如果您想找到一种不使用正则表达式提取信息的方法，那么下面的方法就可以了重要提示：请注意，如果产品名称包含子字符串

“x”

，则此解决方案将失败。我认为在产品名称中使用这样的子字符串并不常见，因此它可能适用于几乎所有产品：

raw_products = ['Bananas Bunch 1 x EUR 3.99', 'Apples 5 x EUR 5.00']
parsed_products = []

for raw_product in raw_products:
    parsed_product = {}

    parts = raw_product.split(' x ')
    subparts = parts[0].split()

    parsed_product['name'] = ' '.join(subparts[:-1])
    parsed_product['quantity'] = int(subparts[-1])
    parsed_product['price'] = float(raw_product.split()[-1])

    parsed_products.append(parsed_product)

print(parsed_products)

同样，如果我们将其与正则表达式解决方案进行比较，这可能不会太有效。但是，如果您不介意效率，并且希望代码更短，则以下内容将是相同的：

raw_products = ['Bananas Bunch 1 x EUR 3.99', 'Apples 5 x EUR 5.00']

parsed_products = [{
    'name': ' '.join(raw_product.split(' x ')[0].split()[:-1]),
    'quantity': int(raw_product.split(' x ')[0].split()[-1]),
    'price': float(raw_product.split()[-1])
} for raw_product in raw_products]

print(parsed_products)

两种解决方案（其实是一样的）都会打印以下内容：

[{'name': 'Bananas Bunch', 'quantity': 1, 'price': 3.99}, {'name': 'Apples', 'quantity': 5, 'price': 5.0}]

编辑：嗯，我并不真正喜欢正则表达式（我仍然没有用它做很多事情），因此下面的代码可能不是最短/最干净的方法，但是下面的代码可以工作：

import re

raw_products = ['Bananas Bunch 1 x EUR 3.99', 'Apples 5 x EUR 5.00']
parsed_products = []

pattern = re.compile(r"""(?P<name>^.*(?=(\s[0-9]+\sx\s)))
                         \s(?P<quantity>[0-9]+(?=(\sx\s)))
                         .*\s(?P<price>[0-9]+\.[0-9]+)$""", re.VERBOSE)

for raw_product in raw_products:
    match = pattern.match(raw_product)

    name = match.group('name')
    quantity = match.group('quantity')
    price = match.group('price')

    parsed_products.append({
        'name': name,
        'quantity': int(quantity),
        'price': float(price)
    })

print(parsed_products)

输出也很成功：

{'name': 'An x box 360', 'quantity': '1', 'price': '299.99'}

我很抱歉，如果一个对正则表达式有很深了解的人看到了这一点，并遭受了心脏病发作！我只是想给出一个有效的解决方案，但我的正则表达式知识实际上是有限的

inp = ['Bananas Bunch 1 x EUR 3.99', 'Apples 5 x EUR 5.00']
str_nums = []
currency = ['EUR', 'USD']
output = []
for i in range(100):
    str_nums.append(str(i))

for i in inp:
    name = ''
    t = i.split()
    for j in t:
        if j in str_nums:
            break
        else:
            name += (j + " ")
    name = name.rstrip()
    # print(t, name)
    output.append({'name': name})
print(output)

这是一个只为名字而工作的粗糙的工作。我想你的输入中会有某种模式。利用这一点来分割输入并将它们分开分为相应的类别。

对于货币部分，列出所有接受的货币类型，以下字符串（即拆分后的字符串）应为您的货币值（请记住，它将是字符串而不是浮动，您必须转换）.

您能提供一个输入和输出示例吗？我已经添加了输入和输出正则表达式，这可能是最简单的方法，无需进入第三方模块（这将超出堆栈溢出的范围）。标准库不包含任何类型的解析器组合器模块。出于好奇，什么是正则表达式解决方案？我编辑了我的答案，给出了一个使用正则表达式的潜在解决方案：）

'An x box 360 1 x EUR 299.99'

{'name': 'An x box 360', 'quantity': '1', 'price': '299.99'}

inp = ['Bananas Bunch 1 x EUR 3.99', 'Apples 5 x EUR 5.00']
str_nums = []
currency = ['EUR', 'USD']
output = []
for i in range(100):
    str_nums.append(str(i))

for i in inp:
    name = ''
    t = i.split()
    for j in t:
        if j in str_nums:
            break
        else:
            name += (j + " ")
    name = name.rstrip()
    # print(t, name)
    output.append({'name': name})
print(output)