python正则表达式获取两个标记之间的文本，并使用新行_Python_Regex

python正则表达式获取两个标记之间的文本，并使用新行

python regex

python正则表达式获取两个标记之间的文本，并使用新行,python,regex,Python,Regex,我是新来的正则表达式。这是我的数据 <p>[tag]y,m,m,l 1997,f,e,2.34g 2000,m,c,2.38[/tag]</p> 这是我的正则表达式 (<p>\[tag(.*)\])(.+)(\[\/tag\]<\/p>) 但由于新行的原因，它无法工作\n。如果我使用re.DOTALL，它可以工作，但是如果我的数据有多个记录，例如 <p>[tag]y,m,m,l 1997,f,e,2.34g 2000,m,c,2.3

我是新来的正则表达式。这是我的数据

<p>[tag]y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38[/tag]</p>

这是我的正则表达式

(<p>\[tag(.*)\])(.+)(\[\/tag\]<\/p>)

但由于新行的原因，它无法工作\n。如果我使用re.DOTALL，它可以工作，但是如果我的数据有多个记录，例如

<p>[tag]y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38[/tag]</p>

<p>[tag]y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38[/tag]</p>

re.findall只返回一个匹配项。我简单地想要这个。

[data1、data2、data3…]。我能做什么？

您可以使用此正则表达式：

\[tag\]([\s\S]*?)\[\/tag\]

匹配信息：

MATCH 1
1.  [8-44]  `y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38`

更新：什么

\[tag\]
([\s\S]*?) --> the [\s\S]*? is used to match everything, since \S will capture
               all non blanks and \s will capture blanks. This is just a trick, you can
               also use [\D\d] or [\W\w]. Btw, the *? is just a ungreedy quantifier
\[\/tag\]

另一方面，如果要允许标记中的属性，可以使用：

\[tag.*?\]([\s\S]*?)\[\/tag\]

就这么简单：

\](.*?)\[

reobj = re.compile(r"\](.*?)\[", re.IGNORECASE | re.DOTALL | re.MULTILINE)
result = reobj.findall(YOURSTRING)

输出：

正则表达式解释：

\] matches the character ] literally
1st Capturing group (.*?)
    .*? matches any character
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\[ matches the character [ literally
s modifier: single line. Dot matches newline characters

数据从何而来？您可以使用dotall修饰符和非贪婪匹配。。。re.findallr'？s\[tag].\[/tag]

'，text但我可能会使用Beatuiful Soup从段落标记中提取文本，然后抓住这些标记之间的内容。是的，谢谢。你能解释一下\s\s以及如果我想得到像[tag id=5 a=3]这样的标签参数，比如[tag id=5 a=5]y，m，m，l 1997，f，e，2.34g 2000，m，c，2.38[/tag tag]

得到y，m，m，l 1997，f，e，2.34g 2000，m，c，2.38和id=5和a=5我问了很多问题，但是“[tag.*.[\s\s]*？[\/tag。例如[tag id=5 name=name]，我有id=5和name=name字符串，分离这些参数的最佳方法是什么。

y,m,m,l
1997,f,e,2.34g
2000,m,c,2.38

\] matches the character ] literally
1st Capturing group (.*?)
    .*? matches any character
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\[ matches the character [ literally
s modifier: single line. Dot matches newline characters