Python 如何使用regex提取img标记中的src？_Python_Regex

Python 如何使用regex提取img标记中的src？

python regex

Python 如何使用regex提取img标记中的src？,python,regex,Python,Regex,我正在尝试从HTML img标记中提取图像源url 如果html数据如下所示： <div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div> 我的个人资料或我的个人资料 python中的正则表达式如何我试过以下方法： i = re.compile('(?P<src>src=[["[^"]+"][\'[^\']+\']]

我正在尝试从HTML img标记中提取图像源url

如果html数据如下所示：

<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>

我的个人资料

或

我的个人资料

python中的正则表达式如何

我试过以下方法：

i = re.compile('(?P<src>src=[["[^"]+"][\'[^\']+\']])')
i.search(htmldata)

i=re.compile（“（？Psrc=[[“[^”]+”][\'[^\']+\']]））
i、 搜索（htmldata）

但我犯了个错误

Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
AttributeError:“非类型”对象没有属性“组”

解析器是一条可行之路

>>> from bs4 import BeautifulSoup
>>> s = '''<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>'''
>>> soup = BeautifulSoup(s, 'html.parser')
>>> img = soup.select('img')
>>> [i['src'] for i in img if  i['src']]
[u'http://domain.com/profile.jpg']
>>>

>>来自bs4导入组
>>>s=''我的个人资料''
>>>soup=BeautifulSoup（s'html.parser'）
>>>img=汤。选择（'img'）
>>>[i['src']如果i['src']]
[u'http://domain.com/profile.jpg']
>>>

我对您的代码做了一些修改。请看一下：

import re

url = """<div> My profile <img width="300" height="300" src="http://domain.com/profile.jpg"> </div>"""
ur11 = """<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>"""

link = re.compile("""src=[\"\'](.+)[\"\']""")

links = link.finditer(url)
for l in links:
    print l.group()
    print l.groups()

links1 = link.finditer(ur11)
for l in links1:
    print l.groups()

finditer（）是一个生成器，允许在循环中使用


来源：

您是否已经尝试自己创建正则表达式；这将有助于上述两行代码不会出现错误。如果在src之后有其他属性，则可能重复的正则表达式将不起作用。而且您的组无法捕获/：-。
等，这可能是url的一部分。这是我的模式。src=[\“\']([a-zA-Z0-9\.\/\-：]+][\“\']肯定还有改进的余地。感谢您的投入。
import re

url = """<div> My profile <img width="300" height="300" src="http://domain.com/profile.jpg"> </div>"""
ur11 = """<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>"""

link = re.compile("""src=[\"\'](.+)[\"\']""")

links = link.finditer(url)
for l in links:
    print l.group()
    print l.groups()

links1 = link.finditer(ur11)
for l in links1:
    print l.groups()  

src="http://domain.com/profile.jpg"
('http://domain.com/profile.jpg',)
('http://domain.com/profile.jpg',)