Python 3.x 在Python3.x中解析包含顺序不同的字段的字符串

Python 3.x 在Python3.x中解析包含顺序不同的字段的字符串,python-3.x,parsing,split,delimiter,Python 3.x,Parsing,Split,Delimiter,我有一些记录: records=['Event: Description of some sort of event, sometimes with a: colon 0 Date: 02/05/2008 Time: 9:30 am Location: Room A Result: Description of result 0', 'Event: Description of event 1 ', 'Event: Description of some sort of even

我有一些记录:

records=['Event: Description of some sort of event, sometimes with a: colon 0 Date: 02/05/2008 Time: 9:30 am Location: Room A Result: Description of result 0',
    'Event: Description of event 1 ',
    'Event: Description of some sort of event 2 Date: 06/03/2010 Time: 1:30 pm Location: Room b Result: Description of result 2',
    'Date: 06/03/2010 Time: 2:30 pm  Event: Description of some sort of event 2 Result: Description of result 2 Location: Room b',
    'Date: 06/03/2010 Result: Description of result 3']
我(最终)想将它们吸收到熊猫数据帧中,但我甚至不知道如何将它们解析成有用的列表或dict。 我正在做的是:

import re
import pandas as pd
delimeters = ['Event:', 'Date:', 'Time:','Location:', 'Result:']
delimeters = '|'.join(delimeters)
print('without parentheses, I lose my delimeters:')
for record in records:
    print(re.split(delimeters, record))
我很好奇为什么这会在每个列表的开头生成一个空项。但更重要的是,我想保留分隔符

我见过一些例子,它们在单个分隔符周围使用括号将其保留在拆分字符串列表中,但这会产生奇怪的结果,其中可能包含串联的delmeters列表。例如,我不明白为什么添加括号会产生“无”——我很想理解这一点

print('With parentheses things get wierd:')
delimeters = ['(Event:)', '(Date:)', '(Time:)','(Location:)', '(Result:)']
delimeters = '|'.join(delimeters)

for record in records:
    print(re.split(delimeters, record))
理想情况下,我将提取以下内容作为记录解析的输出:

{'Event': ['Description of some sort of event, sometimes with a: colon'], 
 'Date': ['02/05/2008'], 
 'Time': ['1:30 pm'], 
 'Location': ['Room b'],
 'Result': ['Some description of the result, sometimes with a : colon']} # etc
这将使我能够直接传递到数据帧:

pd.DataFrame({'Event': ['Description of some sort of event, sometimes with a: colon'], 
 'Date': ['02/05/2008'], 
 'Time': ['1:30 pm'], 
 'Location': ['Room b'],
 'Result': ['Some description of the result, sometimes with a : colon']} 
)

非常感谢您在任何步骤上提供任何指导或帮助

这里有一个不使用正则表达式的解决方案,尽管它确实涉及嵌套循环:

records = ['Event: Description of some sort of event, sometimes with a: colon 0 Date: 02/05/2008 Time: 9:30 am Location: Room A Result: Description of result 0',
    'Event: Description of event 1 ',
    'Event: Description of some sort of event 2 Date: 06/03/2010 Time: 1:30 pm Location: Room b Result: Description of result 2',
    'Date: 06/03/2010 Time: 2:30 pm  Event: Description of some sort of event 2 Result: Description of result 2 Location: Room b',
    'Date: 06/03/2010 Result: Description of result 3']

delims = ('Event:', 'Date:', 'Time:', 'Location:', 'Result:')

parsed = []

# Iterate records
for record in records:
    # An empty dictionary object
    d = {}
    # Split the record into separate words by spaces
    words = record.split(' ')
    # Iterate the words in the record
    for i in range(len(words)):
        # If this word is one of the delimiters
        if words[i] in delims:
            # Set the key to the delimiter (without a colon)
            key = words[i][:-1]
            # Increment the loop counter to skip to the next item
            i += 1
            # Start with a value of an empty list
            val = []
            # While we are inside the array bounds and the word is not a dilimiter
            while i < len(words) and not words[i] in delims:
                # Add this word to the value
                val.append(words[i])
                # Increment the loop counter to skip to the next item
                i += 1
            # Add the key/value pair to the record dictionary
            d[key] = ' '.join(val)
        # Append the record dictionary to the results
    parsed.append(d)


print(repr(parsed))

没问题!请记住,此解决方案要求分隔符之间用空格分隔(因为它们似乎位于所有数据中)
[{'Date': '02/05/2008',
  'Event': 'Description of some sort of event, sometimes with a: colon 0',
  'Location': 'Room A',
  'Result': 'Description of result 0',
  'Time': '9:30 am'},
 {'Event': 'Description of event 1 '},
 {'Date': '06/03/2010',
  'Event': 'Description of some sort of event 2',
  'Location': 'Room b',
  'Result': 'Description of result 2',
  'Time': '1:30 pm'},
 {'Date': '06/03/2010',
  'Event': 'Description of some sort of event 2',
  'Location': 'Room b',
  'Result': 'Description of result 2',
  'Time': '2:30 pm '},
 {'Date': '06/03/2010', 'Result': 'Description of result 3'}]