Python 如何匹配图像标签'；s与regex的链接_Python_Regex

Python 如何匹配图像标签'；s与regex的链接

python regex

Python 如何匹配图像标签'；s与regex的链接,python,regex,Python,Regex,我正在用python编写正则表达式匹配函数。我有以下代码： def src_match(line, img): imgmatch = re.search(r'<img src="(?P<img>.*?)"', line) if imgmatch and imgmatch.groupdict()['img'] == img: print 'the match was:', imgmatch.groupdict()['img'] def src_

我正在用python编写正则表达式匹配函数。我有以下代码：

def src_match(line, img):
    imgmatch = re.search(r'<img src="(?P<img>.*?)"', line)

    if imgmatch and imgmatch.groupdict()['img'] == img:
        print 'the match was:', imgmatch.groupdict()['img']

def src_匹配（行，img）：
imgmatch=re.search（r.*？）“，第行）
如果imgmatch和imgmatch.groupdict（）['img']==img:
打印“匹配项为：”，imgmatch.groupdict（）['img']

上述情况对我来说似乎根本不正确。另一方面，我很幸运：

def href_match(line, url):
    hrefmatch = re.search(r'<a href="(?P<url>.*?)"', line)

    if hrefmatch and hrefmatch.groupdict()['url'] == url:
        print 'the match was:', hrefmatch.groupdict()['url']
    else:
        return None

def href_匹配（行、url）：
hrefmatch=re.search（r'规则#37：不要尝试用正则表达式解析HTML
为作业使用正确的工具-在本例中为BeautifulSoup
编辑：
>>> src_match('<p class="p1"><img src="myfile.png" alt="beat-divisions.tiff"></p>','myfile.png')
the match was: myfile.png
>>> src_match('<p class="p1"><img src="myfile.anotherword.png" alt="beat-divisions.tiff"</p>\n','myfile.anotherword.png')
the match was: myfile.anotherword.png

根据需要剪切和粘贴功能和测试
>>> src_match('this is <img src="my example" />','my example')
the match was: my example

>>src_匹配（'这是'，'我的例子'）
比赛是：我的例子

因此，它似乎可以正常工作；但是它在（完全有效的）HTML代码上会失败，如
<img width="200px" src="Y U NO C ME!!" />



Edit4:
>>> src_match('<p class="p1"><img src="myfile.png" alt="beat-divisions.tiff"></p>','myfile.png')
the match was: myfile.png
>>> src_match('<p class="p1"><img src="myfile.anotherword.png" alt="beat-divisions.tiff"</p>\n','myfile.anotherword.png')
the match was: myfile.anotherword.png

src_匹配（“”，“myfile.png”）
匹配的是：myfile.png
>>>src_匹配（'\n'，'myfile.anotherword.png'）
匹配的是：myfile.anotherword.png

仍然有效；您确定要匹配的url值正确吗？

规则#37：不要尝试使用正则表达式解析HTML

为作业使用正确的工具-在本例中为BeautifulSoup

编辑：

>>> src_match('<p class="p1"><img src="myfile.png" alt="beat-divisions.tiff"></p>','myfile.png')
the match was: myfile.png
>>> src_match('<p class="p1"><img src="myfile.anotherword.png" alt="beat-divisions.tiff"</p>\n','myfile.anotherword.png')
the match was: myfile.anotherword.png

根据需要剪切和粘贴功能和测试

>>> src_match('this is <img src="my example" />','my example')
the match was: my example

>>src_匹配（'这是'，'我的例子'）
比赛是：我的例子

因此，它似乎可以正常工作；但是它在（完全有效的）HTML代码上会失败，如

<img width="200px" src="Y U NO C ME!!" />

Edit4:

>>> src_match('<p class="p1"><img src="myfile.png" alt="beat-divisions.tiff"></p>','myfile.png')
the match was: myfile.png
>>> src_match('<p class="p1"><img src="myfile.anotherword.png" alt="beat-divisions.tiff"</p>\n','myfile.anotherword.png')
the match was: myfile.anotherword.png

src_匹配（“

”，“myfile.png”）匹配的是：myfile.png >>>src_匹配（'

\n'，'myfile.anotherword.png'）匹配的是：myfile.anotherword.png

仍然有效；您确定要匹配的url值是否正确吗？

每次发布相关帖子时，我都要解释这一点，这让我觉得很可笑，但我要再说一次：我并不是要用这个函数构建一个包罗万象的解析器。这是一项很小的工作，在另一种情况下也可以工作。您会注意到的我只想解析两个不同的标记，我更想在此过程中学习更多关于python中正则表达式的知识。谢谢你的更新。不是吗？FWIW，我确切地知道所有标记的外观，因为我有一个生成html的特定程序。尽管你的示例有效，但它从来没有er生成html本身。我也更新了我的问题以使其更清晰。感谢您迄今为止的建议tho！再次您好hugh:我没有包含相关的差异：字符串中有两个句点。在这种情况下我该怎么办？现在看起来更像一个基本的正则表达式q…您将看到我在上面更新的编辑。我不得不这样做对我来说很可笑每次我发表一篇关于这个的帖子时都要解释一下，但我会再说一遍：我并没有试图用这个函数构建一个包罗万象的解析器。这是一项很小的工作，在另一种情况下也能工作。你会注意到，我只想解析两个不同的标记，我更想了解rege谢谢你的更新。那不是吗？FWIW，我知道所有的标签都是什么样子，因为我有一个生成html的特定程序。虽然你的例子是有效的，但它从来不会生成html。我也更新了我的问题，让它更清楚。谢谢你到目前为止的建议tho！你好！hugh:i did不包括相关的差异：字符串中有两个句点。在这种情况下，我该怎么办？现在看起来更像一个基本的正则表达式q…你会看到我在上面更新的编辑。请参阅下面我的回答，这也适用于你的典型（最近）链接。这一点都没有帮助，也没有回答问题。我的问题肯定有一个答案可以帮助我学习。根据我下面的答案，这两个函数都适合我（交互式shell中Windows 7上的Python 2.7.1）.你能给出一个应该有效但失败的输入的反例吗？我在上面的一个编辑中给出了一个失败的例子。谢谢你看。请看下面我的回答，这也适用于你的典型（最近）链接。这一点都没有帮助，也没有回答问题。我的问题肯定有一个答案可以帮助我学习。根据我下面的答案，这两个函数都适合我（交互式shell中Windows 7上的Python 2.7.1）.你能给出一个反例，说明输入应该有效，但失败了吗？我在上面的编辑中放了一个失败的例子。谢谢看。