Python解析字符串列表_Python_Regex_Parsing_Search

Python解析字符串列表

python regex parsing search

Python解析字符串列表,python,regex,parsing,search,Python,Regex,Parsing,Search,我有一个字符串列表，我在寻找这样的行： listconfig = [] for line in list_of_strings: matched = findall("(Key:[\s]*[0-9A-Fa-f]+[\s]*)|(Index:[\s]*[0-9]+[\s]*)|(Field 1:[\s]*[0-9]+[\s]*)|(Field 2:[\s]*[0-9]+[\s]*)|(Field 3:[\s]*[-+]?[0-9]+[\s]*)", line) if matched

我有一个字符串列表，我在寻找这样的行：

listconfig = []
for line in list_of_strings:
    matched = findall("(Key:[\s]*[0-9A-Fa-f]+[\s]*)|(Index:[\s]*[0-9]+[\s]*)|(Field 1:[\s]*[0-9]+[\s]*)|(Field 2:[\s]*[0-9]+[\s]*)|(Field 3:[\s]*[-+]?[0-9]+[\s]*)", line)
    if matched:
        listconfig += [dict(map(lambda pair: (pair[0].strip().lower(), pair[1].strip().lower()),
                                map(lambda line: line[0].split(':'),
                                    [filter(lambda x: x, group) for group in matched])))]

关键字：af12d9索引：0字段1:1234字段2:1234字段3:-10

在找到这样的行之后，我想将每一行存储为一个字典{'key'：af12d9，'index'：0，'field1'：…}，然后将这个字典存储到一个列表中，这样我就有了一个字典列表

我能让它像这样工作：

listconfig = []
for line in list_of_strings:
    matched = findall("(Key:[\s]*[0-9A-Fa-f]+[\s]*)|(Index:[\s]*[0-9]+[\s]*)|(Field 1:[\s]*[0-9]+[\s]*)|(Field 2:[\s]*[0-9]+[\s]*)|(Field 3:[\s]*[-+]?[0-9]+[\s]*)", line)
    if matched:
        listconfig += [dict(map(lambda pair: (pair[0].strip().lower(), pair[1].strip().lower()),
                                map(lambda line: line[0].split(':'),
                                    [filter(lambda x: x, group) for group in matched])))]

我只是想知道是否有更好的方法（简短而高效）来实现这一点，因为我认为findall会对每个字符串进行5次搜索。（正确吗？因为它返回5个元组的列表。）

多谢各位

解决方案：

好的，在brandizzi的帮助下，我找到了这个问题的答案

解决方案：

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

listconfig=[]
对于\u字符串列表\u中的行：
匹配=重新搜索（r“键：[\s]*（？P[0-9A-Fa-f]+）[\s]*”\
r“（索引：[\s]*（？P[0-9]+）[\s]*）？”
r“（字段1:[\s]*（？P[0-9]+）[\s]*）？”
r“（字段2:[\s]*（？P[0-9 A-Za-z]+）[\s]*）？”
r“（字段3:[\s]*（？P[-+]？[0-9]+）[\s]*）？”，第行）
如果匹配：
打印匹配的.groupdict（）
listconfig.append（matched.groupdict（））

由于“环”，示例中的模式可能与示例数据不匹配。以下是一些可能有帮助的代码：

import re
# the keys to look for
keys = ['Key','Index','Field 1','Field 2','Field 3']
# a pattern for those keys in exact order
pattern = ''.join(["(%s):(.*)" % key for key in keys])
# sample data
data = "Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Ring Field 3: -10"
# look for the pattern
hit = re.match(pattern,data)
if hit:
    # get the matched elements
    groups = hit.groups()
    # group them in pairs and create a dict
    d = dict(zip(groups[::2], groups[1::2]))
    # print result
    print d

首先，你的正则表达式似乎不能正常工作。

键

字段应该有可以包括

的值，对吗？所以它的组不应该是

（[0-9A-Ea-e]+）

，而是

（[0-9A-Fa-f]+）

。另外，在处理正则表达式时，在正则表达式字符串前面加上

前缀是一种很好的做法，实际上也是一种很好的做法，因为它避免了

转义字符的问题。（如果您不明白为什么要这样做，请查看）

现在，我对这个问题的看法。首先，我将创建一个没有管道的正则表达式：

>>> regex = r"(Key):[\s]*([0-9A-Fa-f]+)[\s]*" \
...     r"(Index):[\s]*([0-9]+)[\s]*" \
...     r"(Field 1):[\s]*([0-9]+)[\s]*" \
...     r"(Field 2):[\s]*([0-9 A-Za-z]+)[\s]*" \
...     r"(Field 3):[\s]*([-+]?[0-9]+)[\s]*"

通过此更改，

findall（）

将只为整行返回找到的组的一个元组。在此元组中，每个键后面都有其值：

>>> re.findall(regex, line)
[('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')]

所以我得到了元组

>>> found = re.findall(regex, line)[0]
>>> found
('Key', 'af12d9', 'Index', '0', 'Field 1', '1234', 'Field 2', '1234 Ring ', 'Field 3', '-10')

…而且我只用钥匙

>>> found[::2]
('Key', 'Index', 'Field 1', 'Field 2', 'Field 3')

…而且只有以下值：

>>> found[1::2]
('af12d9', '0', '1234', '1234 Ring ', '-10')

然后，我创建一个元组列表，其中包含键及其对应的值：

gran finale将元组列表传递给

dict（）

构造函数：

>>> dict(zip(found[::2], found[1::2]))
{'Field 3': '-10', 'Index': '0', 'Field 1': '1234', 'Key': 'af12d9', 'Field 2': '1234 Ring '}

我认为这是最好的解决办法，但从某种意义上说，这确实是一个主观问题。无论如何：）

您可以使用解析器库。我知道Lepl，所以会使用它，但因为它是用Python实现的，所以效率不高。然而，解决方案相当简短，我希望很容易理解：

def parser():
  key = (Drop("Key:") & Regexp("[0-9a-fA-F]+")) > 'key'
  index = (Drop("Index:") & Integer()) > 'index'
  def Field(n):
      return (Drop("Field" + str(n)) & Integer()) > 'field'+str(n)
  with DroppedSpaces():
      line = (key & index & Field(1) & Field(2) & Field(3)) >> make_dict
      return line[:]
p = parser()
print(p.parse_file(...))

处理数量可变的字段也应该相对简单

请注意，上面的内容没有经过测试（我需要开始工作），但应该是正确的。特别是，它应该根据需要返回词典列表。

好的，在brandizzi的帮助下，我找到了这个问题的答案

解决方案：

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

listconfig = []
for line in list_of_strings:
    matched = re.search(r"Key:[\s]*(?P<key>[0-9A-Fa-f]+)[\s]*" \ 
                        r"(Index:[\s]*(?P<index>[0-9]+)[\s]*)?" \ 
                        r"(Field 1:[\s]*(?P<field_1>[0-9]+)[\s]*)?" \ 
                        r"(Field 2:[\s]*(?P<field_2>[0-9 A-Za-z]+)[\s]*)?" \ 
                        r"(Field 3:[\s]*(?P<field_3>[-+]?[0-9]+)[\s]*)?", line) 
    if matched:
        print matched.groupdict()
        listconfig.append(matched.groupdict())

listconfig=[]
对于\u字符串列表\u中的行：
匹配=重新搜索（r“键：[\s]*（？P[0-9A-Fa-f]+）[\s]*”\
r“（索引：[\s]*（？P[0-9]+）[\s]*）？”
r“（字段1:[\s]*（？P[0-9]+）[\s]*）？”
r“（字段2:[\s]*（？P[0-9 A-Za-z]+）[\s]*）？”
r“（字段3:[\s]*（？P[-+]？[0-9]+）[\s]*）？”，第行）
如果匹配：
打印匹配的.groupdict（）
listconfig.append（matched.groupdict（））

如果您这样做[*]，您的解决方案的性能会更好：

import re

from itertools import imap

regex = re.compile(flags=re.VERBOSE, pattern=r"""
    Key:\s*(?P<key>[0-9A-Fa-f]+)\s*
    Index:\s*(?P<index>[0-9]+)\s*
    Field\s+1:\s*(?P<field_1>[0-9]+)\s*
    Field\s+2:\s*(?P<field_2>[0-9A-Za-z]+)\s*
    Field\s+3:\s*(?P<field_3>[-+]?[0-9]+)\s*
""")

list_of_strings = [
    'Key: af12d9 Index: 0 Field 1: 1234 Field 2: 1234 Field 3: -10',
    'hey joe!',
    ''
]

listconfig = [
    match.groupdict() for match in imap(regex.search, list_of_strings) if match
]

[*]事实上-不，不会。我两个都计时了，两个都不比另一个快。不过，我还是更喜欢我的。

戒指是“第三区”的一部分还是“1234区”的一部分？你真的有“第一区”、“第二区”。。。用空格？那是奇怪的格式。字段1，字段2。。。这会让事情简单得多。您可以自由选择还是需要空格？如果键是十六进制数，您可能需要

[0-9A-Fa-f]

如果您查看他的正则表达式，您将看到“：”从值中描绘键，而“”从以下键中描绘值。@sudo如果始终存在“键”、“索引”、“字段1”、“字段2”，“Field3”在列表的每个字符串元素中，用它们阻塞数据结构是很笨拙的：

（'af12d9'、'0'、'1234'、'1234'、'-10'）

就足够了，您知道第二个元素是Index，最后一个是Field3。-另外：

listconfig+=

是一种不好的做法，因为它会创建一个新的列表并将名称listconfig分配给它。请改为使用append（）。注意：这不会像Mark Tozzi所说的那样进行检查以确保您拥有有效数据——您需要进行某种边界检查以验证其准确性，但根据列出的规则，这是一种更简单、更快的方法。@brandizzi Ha，非常感谢。这真的很有帮助。我发现您只需使用re.search，并使用groupdict（）获取字典，以下是我的改进版本：

matched=re.search（r”Key:[\s]*（？P[0-9A-Fa-f]+）[\s]*“\r”（Index:[\s]*（？P[0-9]+）[\s]*）？“\r”（剂量：[\s]*（？P[0-9]+）[\s]*）？）？“\r”（能量：[\s]*（？P[0-9 A-Za-z]+）[\s]*）？“\r”（环强度：[\s]*（？P[-+]？[0-9]+）[\s]*）？”，行）如果匹配：打印匹配。groupdict（）

@sudo您刚刚找到了解决方案。您真的应该将其作为您自己问题的答案发布，然后将其标记为正确答案！这更为必要，因为您可以在答案中以良好的方式格式化代码，但不能将其作为注释。说真的，将您的答案作为正确答案发布：）但有办法处理这种混合顺序吗ring？例如，索引在键之前？@sudo dict默认情况下不排序。Python 2.7在

集合

模块中提供了一个

orderedict

类-如果您需要保留顺序并可以使用Python 2.7，您可以使用

orderedict（。