Python中的regex findall没有预期的多个结果
我有以下两个Python代码片段(Python中的regex findall没有预期的多个结果,python,regex,findall,Python,Regex,Findall,我有以下两个Python代码片段(short_-sense是long_-sense的一部分) 在长句中,有上面的一个和另一个: To critics of Dodd-Frank, this is thrilling stuff. They see the law as a piece of statist overreach that throttles the American economy. Plenty in the Trump administration would love to
short_-sense
是long_-sense
的一部分)
在长句中,有上面的一个和另一个:
To critics of Dodd-Frank, this is thrilling stuff. They see the law as a piece of statist overreach that throttles the American economy. Plenty in the Trump administration would love to gut it. The president himself has called it a \xe2\x80\x9cdisaster\xe2\x80\x9d. Gary Cohn, until recently one of the leaders of Goldman Sachs, a big bank, and now Mr Trump\xe2\x80\x99s chief economic adviser, promises to \xe2\x80\x9cattack all aspects of Dodd-Frank\xe2\x80\x9d.
我知道Python的re.findall()
会返回所有匹配文本的子文本。当我尝试执行以下操作时:
re.findall("<p.*>(.*?)</p>", short_sentence)
re.findall("<p.*>(.*?)</p>", long_sentence)
我得到了正确的假设结果:
['THE prospect of deregulation helps explain why, since Donald Trump\xe2\x80\x99s election, no bit of the American stockmarket has done better than financial firms. On February 3rd their shares climbed again as Mr Trump signed an executive order asking the Treasury to conduct a 120-day review of America\xe2\x80\x99s financial regulations, including the Dodd-Frank act put in place after the financial crisis of 2007-08, to assess whether these rules meet a set of \xe2\x80\x9ccore principles\xe2\x80\x9d.']
同时,当我试图用以下方法解析长句子
中的两个子字符串时:
re.findall("<p.*>(.*?)</p>", short_sentence)
re.findall("<p.*>(.*?)</p>", long_sentence)
我仍然只有一次(第二次):
我的问题是:在第二种情况下,这里出了什么问题?为什么不在两次发生时都返回它呢?
p.*
是贪婪的,所以它会尽其所能。如果改用p.*?
您将得到预期的结果
如果您需要,请在此提供有关该主题的更多信息:
摘录:
假设您想使用正则表达式来匹配HTML标记。您知道输入将是一个有效的HTML文件,因此正则表达式不需要排除尖括号的任何无效使用。如果它位于尖括号之间,则它是一个HTML标记
大多数不熟悉正则表达式的人都会尝试使用。当他们在字符串上测试它时,他们会感到惊讶,就像这是第一次测试一样。您可能希望正则表达式匹配,并且在匹配之后继续时
使用<代码> R.findall(“P**(**)/p”,LangION语句)< /C> >如果您试图解析HTML或XML,请考虑使用HTML或XML解析库而不是正则表达式。
['To critics of Dodd-Frank, this is thrilling stuff. They see the law as a piece of statist overreach that throttles the American economy. Plenty in the Trump administration would love to gut it. The president himself has called it a \xe2\x80\x9cdisaster\xe2\x80\x9d. Gary Cohn, until recently one of the leaders of Goldman Sachs, a big bank, and now Mr Trump\xe2\x80\x99s chief economic adviser, promises to \xe2\x80\x9cattack all aspects of Dodd-Frank\xe2\x80\x9d.']