Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/308.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python中的regex findall没有预期的多个结果_Python_Regex_Findall - Fatal编程技术网

Python中的regex findall没有预期的多个结果

Python中的regex findall没有预期的多个结果,python,regex,findall,Python,Regex,Findall,我有以下两个Python代码片段(short_-sense是long_-sense的一部分) 在长句中,有上面的一个和另一个: To critics of Dodd-Frank, this is thrilling stuff. They see the law as a piece of statist overreach that throttles the American economy. Plenty in the Trump administration would love to

我有以下两个Python代码片段(
short_-sense
long_-sense
的一部分)

在长句中,有上面的一个和另一个:

To critics of Dodd-Frank, this is thrilling stuff. They see the law as a piece of statist overreach that throttles the American economy. Plenty in the Trump administration would love to gut it. The president himself has called it a \xe2\x80\x9cdisaster\xe2\x80\x9d. Gary Cohn, until recently one of the leaders of Goldman Sachs, a big bank, and now Mr Trump\xe2\x80\x99s chief economic adviser, promises to \xe2\x80\x9cattack all aspects of Dodd-Frank\xe2\x80\x9d.
我知道Python的
re.findall()
会返回所有匹配文本的子文本。当我尝试执行以下操作时:

re.findall("<p.*>(.*?)</p>", short_sentence)
re.findall("<p.*>(.*?)</p>", long_sentence)
我得到了正确的假设结果:

['THE prospect of deregulation helps explain why, since Donald Trump\xe2\x80\x99s election, no bit of the American stockmarket has done better than financial firms. On February 3rd their shares climbed again as Mr Trump signed an executive order asking the Treasury to conduct a 120-day review of America\xe2\x80\x99s financial regulations, including the Dodd-Frank act put in place after the financial crisis of 2007-08, to assess whether these rules meet a set of \xe2\x80\x9ccore principles\xe2\x80\x9d.']
同时,当我试图用以下方法解析
长句子
中的两个子字符串时:

re.findall("<p.*>(.*?)</p>", short_sentence)
re.findall("<p.*>(.*?)</p>", long_sentence)
我仍然只有一次(第二次):


我的问题是:在第二种情况下,这里出了什么问题?为什么不在两次发生时都返回它呢?

p.*
是贪婪的,所以它会尽其所能。如果改用
p.*?
您将得到预期的结果

如果您需要,请在此提供有关该主题的更多信息:

摘录:

假设您想使用正则表达式来匹配HTML标记。您知道输入将是一个有效的HTML文件,因此正则表达式不需要排除尖括号的任何无效使用。如果它位于尖括号之间,则它是一个HTML标记

大多数不熟悉正则表达式的人都会尝试使用。当他们在字符串上测试它时,他们会感到惊讶,就像这是第一次测试一样。您可能希望正则表达式匹配,并且在匹配之后继续时


使用<代码> R.findall(“P**(**)/p”,LangION语句)< /C> >如果您试图解析HTML或XML,请考虑使用HTML或XML解析库而不是正则表达式。
['To critics of Dodd-Frank, this is thrilling stuff. They see the law as a piece of statist overreach that throttles the American economy. Plenty in the Trump administration would love to gut it. The president himself has called it a \xe2\x80\x9cdisaster\xe2\x80\x9d. Gary Cohn, until recently one of the leaders of Goldman Sachs, a big bank, and now Mr Trump\xe2\x80\x99s chief economic adviser, promises to \xe2\x80\x9cattack all aspects of Dodd-Frank\xe2\x80\x9d.']