为粗体/下划线字符串（Python）查找正确的正则表达式_Python_Regex

为粗体/下划线字符串（Python）查找正确的正则表达式

python regex

为粗体/下划线字符串（Python）查找正确的正则表达式,python,regex,Python,Regex,所以我有两套标准，我想在一个字符串中找到。例如： import re bold_pattern = re.compile() #pattern for finding all words in between ** ** underline_pattern = re.compile() # pattern for finding all words in between __ __ a = "__Hello__ **This** __is__ **Lego**" 我如何在正则表达式上执行此操作

所以我有两套标准，我想在一个字符串中找到。例如：

import re
bold_pattern = re.compile() #pattern for finding all words in between ** **
underline_pattern = re.compile() # pattern for finding all words in between __ __
a = "__Hello__ **This** __is__ **Lego**"

我如何在正则表达式上执行此操作？

使用

re.findall

我们可以尝试：

a = "__Hello__ **This** __is__ **Lego**"
terms = re.findall(r'\*\*(.*?)\*\*', a)
print(terms)

这张照片是：

['This', 'Lego']

使用捕获模式捕获两种模式之间的单词：

bold_pattern = re.compile(r'\*\*(.*?)\*\*')   # pattern for finding all words in between ** **
underline_pattern = re.compile(r'__(.*?)__')  # pattern for finding all words in between __ __

然后在

re.findall

中使用它们：

bolds = re.findall(bold_pattern, a)
# or: bold_pattern.findall(a)
underlines = re.findall(underline_pattern, a)
# or: underline_pattern.findall(a)

希望这有帮助：）您需要首先在compile中定义模式，然后使用findall函数提取字符串。您还可以按照@timbiegeleisen的建议，在findall函数中定义模式，在一行中完成这项工作

import re
bold_pattern = re.compile(r'\*\*(.*?)\*\*') 
underline_pattern = re.compile(r'\_\_(.*?)\_\_')
a = "__Hello__ **This** __is__ **Lego**"
print(bold_pattern.findall(a))
print(underline_pattern.findall(a))

建议: 如果处理多行文本（即

\n

），则需要将参数：

flags=re.DOTALL

传递给

re.findall（）

方法

大小写：多行文字 返回：

来自Python 3.8.2文档：
“可以通过指定标志值来修改表达式的行为。”

处理（\n）根据您的需要，有几种不同的方法可以处理

\n

。如果需要，在执行任何其他操作之前，我将在整个文本体上使用

re.sub（）

编译还是不编译？来自Python 3.8.2文档：
“有些函数是用于编译正则表达式的全功能方法的简化版本。大多数非普通应用程序总是使用编译形式…
…但如果在单个程序中多次使用该表达式，则使用re.compile（）并保存生成的正则表达式对象以供重用会更有效。”

及

传递给re.compile（）的最新模式的编译版本和模块级匹配函数被缓存，因此一次只使用几个正则表达式的程序不必担心编译正则表达式

因此，除非您使用了一整套模式，否则您不应该看到编译带来的显著改进

您还可以使用

%%time

magic命令测试这两个选项，看看您是否注意到本地的优势

祝你好运

开始谢谢！旁注-因为它已经被编译了，所以我只会使用粗体模式。findall（a）不是吗？

# string to be searched
a = """
__Hello__ **This 
is a multiline test** __it is__ **Lego
**
"""

# pattern variations
bold_pattern = r'\*\*(.*?)\*\*'

# call re functions
match = re.findall(pattern=bold_pattern, string=a)
flag_match = re.findall(pattern=bold_pattern, string=a, flags=re.DOTALL)

# print results for observation
print(match)
print(flag_match) # using the flag

[' __it is__ ']
['This \nis a multiline test', 'Lego\n']