Python 多行字符串的正则表达式？_Python_Regex

Python 多行字符串的正则表达式？

python regex

Python 多行字符串的正则表达式？,python,regex,Python,Regex,我有以下意见： str = """ Q: What is a good way of achieving this? A: I am not sure. Try the following: 1. Take this first step. Execute everything. 2. Then, do the second step 3. And finally, do the last one Q: What is anothe

我有以下意见：

str = """

    Q: What is a good way of achieving this?

    A: I am not sure. Try the following:

    1. Take this first step. Execute everything.

    2. Then, do the second step

    3. And finally, do the last one



    Q: What is another way of achieving this?

    A: I am not sure. Try the following alternatives:

    1. Take this first step from before. Execute everything.

    2. Then, don't do the second step

    3. Do the last one and then execute the above step

"""

我想捕获输入中的QA对，但我无法获得一个好的正则表达式来完成这项工作。我做到了以下几点：

(?ms)^[\s#\-\*]*(?:Q)\s*:\s*(\S.*?\?)[\s#\-\*]+(?:A)\s*:\s*(\S.*)$

但是，我可以按如下方式捕获输入：

('Q', 'What is a good way of achieving this?')
('A', "I am not sure. Try the following:\n    1. Take this first step. Execute everything.\n    2. Then, do the second step\n    3. And finally, do the last one\n\n    Q: What is another way of achieving this?\n    A: I am not sure. Try the following alternatives:\n    1. Take this first step from before. Execute everything.\n    2. Then, don't do the second step\n    3. Do the last one and then execute the above step\n")

注意第二个QA对是如何被第一个QA对捕获的。如果我在应答正则表达式的末尾使用贪婪的

？

，它不会捕获枚举。关于如何解决这个问题有什么建议吗？

解决这个问题的懒惰但不是最好的方法是用“Q:”分解字符串，然后用简单的/Q:（.+）A:（.+）/msU（一般是regexp）解析部分。

使用这个方法对我来说很好。只需要修剪一点空白

(?s)(Q):((?:(?!A:).)*)(A):((?:(?!Q:).)*)

使用示例：

>>> import re
>>> str = """
...
...     Q: What is a good way of achieving this?
...
...     A: I am not sure. Try the following:
...
...     1. Take this first step. Execute everything.
...
...     2. Then, do the second step
...
...     3. And finally, do the last one  ...      ...   ...
...     Q: What is another way of achieving this?
...
...     A: I am not sure. Try the following alternatives:
...
...     1. Take this first step from before. Execute everything.
...
...     2. Then, don't do the second step
...
...     3. Do the last one and then execute the above step
...
... """
>>> regex = r"(?s)(Q):((?:(?!A:).)*)(A):((?:(?!Q:).)*)"
>>> match = re.findall(regex, str)
>>> map(lambda x: [part.strip().replace('\n', '') for part in x], match)
[['Q', 'What is a good way of achieving this?', 'A', 'I am not sure. Try the following:    1. Take this first step. Execute everything.    2. Then, do the second step    3. And finally, do the last one'], ['Q', 'What is another way of achieving this?', 'A', "I am not sure. Try the following alternatives:    1. Take this first step from before. Execute everything.    2. Then, don't do the second step    3. Do the last one and then execute the above step"]]

甚至还添加了一些东西来帮助您清理结尾的空白。

我还没有那么聪明来编写大型正则表达式，所以这里是我的非正则表达式解决方案-

>>> str = """

    Q: What is a good way of achieving this?

    A: I am not sure. Try the following:

    1. Take this first step. Execute everything.

    2. Then, do the second step

    3. And finally, do the last one



    Q: What is another way of achieving this?

    A: I am not sure. Try the following alternatives:

    1. Take this first step from before. Execute everything.

    2. Then, don't do the second step

    3. Do the last one and then execute the above step

"""
>>> qas = str.strip().split('Q:')
>>> clean_qas = map(lambda x: x.strip().split('A:'), filter(None, qas))
>>> print clean_qas
[['What is a good way of achieving this?\n\n    ', ' I am not sure. Try the following:\n\n    1. Take this first step. Execute everything.\n\n    2. Then, d
o the second step\n\n    3. And finally, do the last one'], ['What is another way of achieving this?\n\n    ', " I am not sure. Try the following alternativ
es:\n\n    1. Take this first step from before. Execute everything.\n\n    2. Then, don't do the second step\n\n    3. Do the last one and then execute the
above step"]]

不过你应该清理空白。或者你可以照普契克说的做

只是为了好玩-

>>> clean_qas = map(lambda x: map(lambda s: s.strip(), x.strip().split('A:')), filter(None, qas))
>>> print clean_qas
[['What is a good way of achieving this?', 'I am not sure. Try the following:\n\n    1. Take this first step. Execute everything.\n\n    2. Then, do the sec
ond step\n\n    3. And finally, do the last one'], ['What is another way of achieving this?', "I am not sure. Try the following alternatives:\n\n    1. Take
 this first step from before. Execute everything.\n\n    2. Then, don't do the second step\n\n    3. Do the last one and then execute the above step"]]

但看起来很难看。

稍微修改您的原始解决方案：

(?ms)^[\s#\-\*]*(?:Q)\s*:\s+(\S[^\n\r]*\?)[\s#\-\*]+(?:A)\s*:\s+(\S.*?)\s*(?=$|Q\s*:\s+)

问答必须在
```
：
```
后面至少有一个空格
不允许在问题中使用换行符，而不是非贪婪地匹配问题（不允许在一个问题中有多个
```
？
```
）
不匹配字符串的结尾，而是不贪婪地匹配，直到匹配之后是字符串的结尾或之后是另一个问题

使用
re.findall
获取所有问题/答案匹配项。
你不只是逐行阅读有什么原因吗？@WesAlvaro:那么你是不是建议我逐行阅读，然后使用基于状态机的方法而不是正则表达式来处理这个问题？首先按
Q:
分割是一种愚蠢的方法然后用
A:
（你得到四个：
[[Q，A]，[Q，A]]
）？你用
[\s\\-\*]*
模式做什么？起初，我以为您在尝试匹配注释，但这在
-
和
*
@Legend中没有真正意义，是的，我建议您不要为此使用正则表达式。你应该尝试创建一个共同的例行程序来完成它=D如果你想偷看我的，在这里：