Python，搜索_Python_Regex - Fatal编程技术网

Python，搜索

python regex

Python，搜索,python,regex,Python,Regex,考虑一个字符串列表。我想找到所有以结尾的子字符串如何做到这一点我已经尝试从这个问题转换正则表达式：但由于我不熟悉正则表达式，所以我的试验都没有成功注1：我不关注正则表达式，欢迎使用任何有效的解决方案注2：我不解析HTML或任何标记语言您可以使用regex这样做： import re regex = r"<([^>]*)>" test_list = ["<hi how are you> I think <not anymore> whatev

考虑一个字符串列表。我想找到所有以<开头并以>结尾的子字符串

如何做到这一点

我已经尝试从这个问题转换正则表达式：

但由于我不熟悉正则表达式，所以我的试验都没有成功

注1：我不关注正则表达式，欢迎使用任何有效的解决方案

注2：我不解析HTML或任何标记语言

您可以使用regex这样做：

import re

regex = r"<([^>]*)>"

test_list = ["<hi how are you> I think <not anymore> whatever <amazing hi>", "second <first> <third>"]

for test_str in test_list:
    matches = re.finditer(regex, test_str, re.MULTILINE)

    for matchNum, match in enumerate(matches, start=1):

        print("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum=matchNum, start=match.start(),
                                                                        end=match.end(), match=match.group()))

输出：

Match 1 was found at 0-16: <hi how are you>
Match 2 was found at 25-38: <not anymore>
Match 3 was found at 48-60: <amazing hi>
Match 1 was found at 7-14: <first>
Match 2 was found at 15-22: <third>

如果要删除，可以执行字符串替换

但是，如果您有HTML或XML等结构化文本，请使用合法的解析器。

使用re.findall：

我发现这是一个修补正则表达式的好网站。

这应该满足您的需要

import re

strings = ["x<first>x<second>x", "x<third>x"]
result = [substring for substring in re.findall(r"<.*?>", string) for string in strings]
print(result)

在这里，查找正则表达式的子字符串中的所有匹配项。A用于循环列表中的所有字符串以及字符串中的所有匹配项

顺便问一下，为什么需要像这样匹配尖括号？如果要解析HTML或XML，最好使用专用的解析器，因为编写自己的正则表达式容易出错，而且单靠正则表达式无法处理任意嵌套的元素。

看看re模块。我们不是来解决您的问题的。你自己试试看，如果你陷入困境，你可以寻求帮助。或者使用搜索功能，我相信也有一些类似的问题。关键字：regex你可以用regex来做，但是如果你在谈论像HTML或XML这样的结构化文本，你需要一个合法的解析器

import re

strings = ["x<first>x<second>x", "x<third>x"]
result = [substring for substring in re.findall(r"<.*?>", string) for string in strings]
print(result)