Python 3.x Python中字符串列表中的正则表达式_Python 3.x_Regex_List_Dictionary_Namedtuple

Python 3.x Python中字符串列表中的正则表达式

python-3.x regex list dictionary

Python 3.x Python中字符串列表中的正则表达式,python-3.x,regex,list,dictionary,namedtuple,Python 3.x,Regex,List,Dictionary,Namedtuple,我有一个名为Statement的列表，它是使用pytesseract和Regex从pdf创建的： Statement= ['07-10-2019 UPI/927912685773/UPI/surya.balaji94@/Citibank 6,677.00 2,36,804.08', '07-10-2019 MOBILE BANKING DUT/CITIBANK 3,403.00 2,40,207.08', '07-10-2019 BIL/INFT/001818

我有一个名为Statement的列表，它是使用pytesseract和Regex从pdf创建的：

Statement= ['07-10-2019 UPI/927912685773/UPI/surya.balaji94@/Citibank 6,677.00 2,36,804.08',
         '07-10-2019 MOBILE BANKING DUT/CITIBANK 3,403.00 2,40,207.08',
         '07-10-2019 BIL/INFT/001818195728/82D3/ AJAY KUMAR JHA 6,080.00 2,46, 287.08',
         '08.10.2019 MOBILE BANKING MMM TiMPS/928115182374/8161 Oct Mte/AMARJEET SIHDFC 4,411.00 250,698.08',
         '08-10-2019 BIL/INFT/001818636132/E3 BIk1 Pramod/ PRAMOD KUMAR P 6,599.00 2,57,297.08']

在stack的帮助下，我创建了一个字典列表，如下所示：

cols = ["Date", "Item_Name", "Transaction_Amount", "Balance"]
date_pattern = re.compile(r"\d{2}[- /.]\d{2}[- /.]\d{4}", re.I)
item_and_name_pattern = re.compile(r"(?<=\d{2}-\d{2}-\d{4}\s).*", re.I)
amount_pattern = re.compile(r"\d+,\d+.\d+", re.I)
total_pattern = re.compile(r"\d+,\d+,\d+.\d+$", re.I)
Transaction = namedtuple("Transaction", cols)
transactions = []
for item in Statement:
    try:
        date = re.search(date_pattern, item).group()
        total = re.search(total_pattern, item).group()
        temp_1 = item.rstrip(total)
        amount = re.search(amount_pattern, item).group()
        temp_2 = temp_1.strip().rstrip(amount)
        item_and_name = re.search(item_and_name_pattern, temp_2).group()
    except:
        pass
    t = Transaction(date, item_and_name, amount, total)
    transactions.append(t)
    out = [{k:v for k, v in f._asdict().items()} for f in transactions]

这里有一个更简单的方法：

import re

pattern = re.compile("(?P<Date>\d{2}[.-]\d{2}[.-]\d{4})\s(?P<Item_Name>.+)\s(?P<Transaction_Amount>[0-9,\.]+)\s(?P<Balance>[0-9,\.]+)")

print([pattern.match(item).groupdict() for item in Statement])

哦，哈哈！当然，谢谢。如果列表很长，并且一些模式不匹配，我想尝试一下，除了通过那里。有办法吗？

import re

pattern = re.compile("(?P<Date>\d{2}[.-]\d{2}[.-]\d{4})\s(?P<Item_Name>.+)\s(?P<Transaction_Amount>[0-9,\.]+)\s(?P<Balance>[0-9,\.]+)")

print([pattern.match(item).groupdict() for item in Statement])

result = []
for item in Statement:
    try:
        result.append(pattern.match(item).groupdict())
    except AttributeError:
          pass

print(result)