Python 在列表中找到html链接地址字符串_Python

Python 在列表中找到html链接地址字符串

python

Python 在列表中找到html链接地址字符串,python,Python,我有一张名为“aList”的名单 [ "<a href='a.html?dataset=1'><tt>outputs</tt></a></td>\n", "<a href='a.html?dataset=1'><tt>outputs</tt></a></td>\n", "<a href='a.html?dataset=1'><tt>outputs&l

我有一张名为“aList”的名单

[
"<a href='a.html?dataset=1'><tt>outputs</tt></a></td>\n", 
"<a href='a.html?dataset=1'><tt>outputs</tt></a></td>\n", 
"<a href='a.html?dataset=1'><tt>outputs</tt></a></td>\n", 
"<img src='folder.gif' alt='folder'> &nbsp;<a href='catalog.html'><tt>test all files in a directory/</tt></a></td>\n", 
"<img src='/thredds/folder.gif' alt='folder'> &nbsp;<a href='enhancedcatalog.html'><tt>test enhanced catalog/</tt></a></td>\n",
"<hr size='1' noshade='noshade'><h3><a href='/abc/catalog.html'>abc</a> at <a href='http://www.abcd.com/'>csiro</a> see <a href='/abcd/serverinfo.html'> info </a><br>\n", 
"data server [version 4.6.10 - 2017-04-19t16:32:55-0600] <a href='http://www.unidata.ucar.edu/software/thredds/current/tds/reference/index.html'> documentation</a></h3>\n"
]

我试过了，但没有得到预期的结果。请给我一些建议

matching = [s for s in aList if ".html" in s]
print(matching)

您可以使用正则表达式或BeautifulSoup来获取html中的href值。这里我给出了使用正则表达式的代码。希望对你有帮助

urls=set()
for link in aList:
    urls.update(re.findall(r'href=[\'"]?([^\'" >]+)', link))
for url in urls: 
    print(url)

输出 /abcd/serverinfo.html
enhancedcatalog.html

a、 html？数据集=1
catalog.html
/abc/catalog.html

您可以使用正则表达式或BeautifulSoup来获取html中的href值。这里我给出了使用正则表达式的代码。希望对你有帮助

urls=set()
for link in aList:
    urls.update(re.findall(r'href=[\'"]?([^\'" >]+)', link))
for url in urls: 
    print(url)

输出 /abcd/serverinfo.html
enhancedcatalog.html

a、 html？数据集=1
catalog.html
/abc/catalog.html