Python 使用Beautifulsoup提取xml文件的img内部描述标记
我在做语法分析。我想得到描述标签内的图像。我正在使用urllib和beautifulsou。我可以得到单独标签内的图像,但无法得到编码格式的描述标签内的图像 Xml代码Python 使用Beautifulsoup提取xml文件的img内部描述标记,python,xml,beautifulsoup,Python,Xml,Beautifulsoup,我在做语法分析。我想得到描述标签内的图像。我正在使用urllib和beautifulsou。我可以得到单独标签内的图像,但无法得到编码格式的描述标签内的图像 Xml代码 <item> <title>Kidnapped NDC member and political activist tells his story</title> <link>http://www.yementimes.com/en/1724/n
<item>
<title>Kidnapped NDC member and political activist tells his story</title>
<link>http://www.yementimes.com/en/1724/news/3065</link>
<description><img src="http://www.yementimes.com/images/thumbnails/cms-thumb-000003081.jpg" border="0" align="left" hspace="5" />
‘I kept telling them that they would never break me and that the change we demanded in 2011 would come whether they wanted it or not’
<br clear="all"></description>
谁能给我一个做这件事的主意吗
这就是我在单独的标记中解析图像所做的。。。
我试图获取它是否是内部描述,但我无法获取。您可以尝试从
中提取所有内容,使用它创建一个新的BeautifulSoup
对象,并搜索第一个
元素的src
属性:
像这样运行:
from bs4 import BeautifulSoup
import sys
import html.parser
h = html.parser.HTMLParser()
soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for i in soup.find_all('item'):
d = BeautifulSoup(h.unescape(i.description.string))
print(d.img['src'])
这将产生:
python3 script.py xmlfile
python3 script.py xmlfile
http://www.yementimes.com/images/thumbnails/cms-thumb-000003081.jpg