Python BeautifulSoup-在此HTML中查找链接_Python_Beautifulsoup

Python BeautifulSoup-在此HTML中查找链接

python

Python BeautifulSoup-在此HTML中查找链接,python,beautifulsoup,Python,Beautifulsoup,这是我获取html的代码 from bs4 import BeautifulSoup import urllib.request from fake_useragent import UserAgent url = "https://blahblah.com" ua = UserAgent() ran_header = ua.random req = urllib.request.Request(url,data=None,headers={'User-Agent': ran_header})

这是我获取html的代码

from bs4 import BeautifulSoup
import urllib.request
from fake_useragent import UserAgent

url = "https://blahblah.com"
ua = UserAgent()
ran_header = ua.random
req = urllib.request.Request(url,data=None,headers={'User-Agent': ran_header})
uClient = urllib.request.urlopen(req)
page_html = uClient.read()
uClient.close()

html_source = BeautifulSoup(page_html, "html.parser")
results = html_source.findAll("a",{"onclick":"googleTag('click-listings-item-image');"})

从这里

results

包含包含不同信息的各种列表。如果我随后打印（结果[0]）：

如何从

结果[0]

中获取第一个href？

您可以使用

查找所有（，href=True）

e、 g:

您可以使用

find_all（，href=True）

e、 g:

基于聊天讨论，

href

链接看起来很简单：

结果[0]['href']

基于聊天讨论，

href

链接看起来很简单：

结果[0]['href']

您的选择器正在返回一个

标记元素，如打印输出中所示。因此，是的，您只需使用结果[0]['href']
直接访问href即可。您也可以这样说，因为页面上的整个面板（显示列表的卡片）是一个可单击的元素。如果您想让这更清楚，可以将结果选择器更改为#js_thumb_view~a
。这也是一个更快的选择器
results = html_source.select('#js_thumb_view ~ a')

然后所有链接，例如
links = [result['href'] for result in results]

选择器正在返回一个标记元素，如打印输出中所示。因此，是的，您只需使用结果[0]['href']
直接访问href即可。您也可以这样说，因为页面上的整个面板（显示列表的卡片）是一个可单击的元素。如果您想让这更清楚，可以将结果选择器更改为#js_thumb_view~a
。这也是一个更快的选择器
results = html_source.select('#js_thumb_view ~ a')

然后所有链接，例如
links = [result['href'] for result in results]

嗯，我得到了索引器：列表索引超出范围results=html\u source.findAll（“a”，“onclick”：“googleTag（'click-listings-item-image'）；”}，href=True）[0]
可能无法正常工作。感谢您目前的帮助。我会继续尝试的。嗯，我得到了索引器：列表索引超出范围results=html\u source.findAll（“a”，{“onclick”）：“googleTag（'click-listings-item-image'）；”}，href=True）[0]
可能会工作。非常遗憾，不需要。感谢您迄今为止的帮助。我会继续尝试的。那html\u source.a['href']
呢？这给了我一个href，但在页面上出现得更早。results
的原因是将我关心的列表分离出来。我得到了TypeError:“NoneType”对象不可调用我在执行第一行results\u list=BeautifulSoup（results[0]，“html.parser”）
时得到了这个错误，所以results\u list关于html\u source.a['href']
？这给了我一个href，但这一点在页面上出现得更早。results
的原因是将我关心的列表分离出来。我得到了TypeError:“NoneType”对象不可调用我在执行第一行results\u list=BeautifulSoup（results[0]，“html.parser”）所以结果列表
links = [result['href'] for result in results]