如何在Python中使用正则表达式标记示例字符串？_Python_Regex_Tokenize_Lexical Analysis

如何在Python中使用正则表达式标记示例字符串？

python regex

如何在Python中使用正则表达式标记示例字符串？,python,regex,tokenize,lexical-analysis,Python,Regex,Tokenize,Lexical Analysis,我不熟悉正则表达式。除了找出匹配以下字符串的模式外，还请指出参考和/或示例网站数据字符串 1. First1 Last1 - 20 (Long Description) 2. First2 Last2 - 40 (Another Description) 我希望能够从上面的字符串中提取元组{First1，Last1,20}和{First2，Last2,40} 此处无需使用正则表达式： foo = "1. First1 Last1 - 20 (Long Description)" fo

我不熟悉正则表达式。除了找出匹配以下字符串的模式外，还请指出参考和/或示例网站

数据字符串

1.  First1 Last1 - 20 (Long Description) 
2.  First2 Last2 - 40 (Another Description)

我希望能够从上面的字符串中提取元组{First1，Last1,20}和{First2，Last2,40}

此处无需使用正则表达式：

foo = "1.  First1 Last1 - 20 (Long Description)"
foo.split(" ")
>>> ['1.', '', 'First1', 'Last1', '-', '20', '(Long', 'Description)']

现在可以选择喜欢的元素（它们将始终处于相同的索引）

在2.7+中，您可以使用来选择元素：

tuple(compress(foo.split(" "), [0,0,1,1,0,1]))

此处无需使用正则表达式：

foo = "1.  First1 Last1 - 20 (Long Description)"
foo.split(" ")
>>> ['1.', '', 'First1', 'Last1', '-', '20', '(Long', 'Description)']

现在可以选择喜欢的元素（它们将始终处于相同的索引）

在2.7+中，您可以使用来选择元素：

tuple(compress(foo.split(" "), [0,0,1,1,0,1]))

这个看起来不错：只需浏览一下，试一些例子。regexpes有点棘手（基本上是一种编程语言），需要一些时间学习，但了解它们非常有用。只要做实验，一步一步走

（是的，我可以给你答案，但是鱼，伙计，教你）

根据要求，不使用split（）解决方案时的解决方案：迭代各行，并检查每一行：

p = re.compile('\d+\.\s+(\w+)\s+(\w+)\s+-\s+(\d+)')
m = p.match(the_line)
// m.group(0) will be the first word
// m.group(1) the second word
// m.group(2) will be the firstnumber after the last word.

The regexp is :<some digits><a dot>
<some whitespace><alphanumeric characters, captured as group 0>
<some whtespace><alphanumeric characters, captured as group 1>
<some whitespace><a '-'><some witespace><digits, captured as group 2>

p=re.compile（'\d+\.\s+（\w+）\s+（\w+）\s+-\s+（\d+））
m=p.match（_线）
//m.group（0）将是第一个单词
//组（1）第二个词
//m.group（2）将是最后一个单词后的第一个数字。
regexp是：

这有点严格，但这样你会发现不符合要求的线条。

这一条看起来不错：只需浏览一下，试一些例子。regexpes有点棘手（基本上是一种编程语言），需要一些时间学习，但了解它们非常有用。只要做实验，一步一步走

（是的，我可以给你答案，但是鱼，伙计，教你）

根据要求，不使用split（）解决方案时的解决方案：迭代各行，并检查每一行：

p = re.compile('\d+\.\s+(\w+)\s+(\w+)\s+-\s+(\d+)')
m = p.match(the_line)
// m.group(0) will be the first word
// m.group(1) the second word
// m.group(2) will be the firstnumber after the last word.

The regexp is :<some digits><a dot>
<some whitespace><alphanumeric characters, captured as group 0>
<some whtespace><alphanumeric characters, captured as group 1>
<some whitespace><a '-'><some witespace><digits, captured as group 2>

p=re.compile（'\d+\.\s+（\w+）\s+（\w+）\s+-\s+（\d+））
m=p.match（_线）
//m.group（0）将是第一个单词
//组（1）第二个词
//m.group（2）将是最后一个单词后的第一个数字。
regexp是：

这有点严格，但这样你会发现不符合要求的线条。

基于哈曼的部分解决方案，我提出了以下建议：

(?P<first>\w+)\s+(?P<last>\w+)[-\s]*(?P<number>\d[\d,]*)

（？P\w+）\s+（？P\w+[-\s]*（？P\d[\d，]*）

代码和输出：

>>> regex = re.compile("(?P<first>\w+)\s+(?P<last>\w+)[-\s]*(?P<number>\d[\d,]*)")
>>> r = regex.search(string)
>>> regex.findall(string)
[(u'First1', u'Last1', u'20'), (u'First2', u'Last2', u'40')]

regex=re.compile（（？P\w+）\s+（？P\w+[-\s]*（？P\d[\d，]*）） >>>r=regex.search（字符串） >>>regex.findall（字符串） [（u'First1'，u'Last1'，u'20'），（u'First2'，u'Last2'，u'40'）]

基于哈曼的部分解决方案，我提出了以下建议：

(?P<first>\w+)\s+(?P<last>\w+)[-\s]*(?P<number>\d[\d,]*)

（？P\w+）\s+（？P\w+[-\s]*（？P\d[\d，]*）

代码和输出：

>>> regex = re.compile("(?P<first>\w+)\s+(?P<last>\w+)[-\s]*(?P<number>\d[\d,]*)")
>>> r = regex.search(string)
>>> regex.findall(string)
[(u'First1', u'Last1', u'20'), (u'First2', u'Last2', u'40')]

我想再次核实一下我的理解。谢谢。好的，我添加了一些示例代码，但不是完整的解决方案：）我希望答案也能再次检查我的理解。谢谢。好的，我添加了一些示例代码，但不是完整的解决方案：）