Python 还是用正则表达式？_Python_Regex

Python 还是用正则表达式？

python regex

Python 还是用正则表达式？,python,regex,Python,Regex,我有几千行的文本文件。我想将这个文件解析到数据库中，并决定编写一个regexp。以下是文件的一部分： blablabla checked=12 unchecked=1 blablabla unchecked=13 blablabla checked=14 因此，我想得到这样的东西 (12,1) (0,13) (14,0) 有可能吗？使用两个不同的正则表达式提取这两个数字最简单：r“checked=（\d+）和r“unchecked=（\d+）这将为您提供字符串（如果未找到，则为None），

我有几千行的文本文件。我想将这个文件解析到数据库中，并决定编写一个regexp。以下是文件的一部分：

blablabla checked=12 unchecked=1
blablabla unchecked=13
blablabla checked=14

因此，我想得到这样的东西

(12,1)
(0,13)
(14,0)

有可能吗？

使用两个不同的正则表达式提取这两个数字最简单：

r“checked=（\d+）

和

r“unchecked=（\d+）

这将为您提供字符串（如果未找到，则为

None

），但想法应该很清楚。

另一种方法：

import sys
import re

r = re.compile(r"((?:un)?checked)=(\d+)")

for line in open(sys.argv[1]):
    d = dict( r.findall(line) )
    print d

输出：

{'checked': '12', 'unchecked': '1'}
{'unchecked': '13'}
{'checked': '14'}

我相信，这是更通用和可重用的：

import re

def tuple_producer(input_lines, attributes):
    """Extract specific attributes from lines 'blabla attribute=value …'"""
    for line in input_lines:
        line_attributes= {}
        for match in re.finditer("(\w+)=(\d+)", line):
            line_attributes[match.group(1)]= int(match.group(2)) # int cast
        yield tuple(
            line_attributes.get(attribute, 0) # int constant
            for attribute in wanted_attributes)


>>> lines= """blablabla checked=12 unchecked=1
blablabla unchecked=13
blablabla checked=14""".split("\n")
>>> list(tuple_producer(lines, ("checked", "unchecked")))
[(12, 1), (0, 13), (14, 0)]

# and an irrelevant example
>>> list(tuple_producer(lines, ("checked", "inexistant")))
[(12, 0), (0, 0), (14, 0)]

注意整数的转换；如果不需要，请删除

int

强制转换，并将

int常量转换为

“0”

{'checked': '12', 'unchecked': '1'}
{'unchecked': '13'}
{'checked': '14'}

import re

def tuple_producer(input_lines, attributes):
    """Extract specific attributes from lines 'blabla attribute=value …'"""
    for line in input_lines:
        line_attributes= {}
        for match in re.finditer("(\w+)=(\d+)", line):
            line_attributes[match.group(1)]= int(match.group(2)) # int cast
        yield tuple(
            line_attributes.get(attribute, 0) # int constant
            for attribute in wanted_attributes)


>>> lines= """blablabla checked=12 unchecked=1
blablabla unchecked=13
blablabla checked=14""".split("\n")
>>> list(tuple_producer(lines, ("checked", "unchecked")))
[(12, 1), (0, 13), (14, 0)]

# and an irrelevant example
>>> list(tuple_producer(lines, ("checked", "inexistant")))
[(12, 0), (0, 0), (14, 0)]