Python+BeautifulSoup：如何获取'a'元素的'href'属性？_Python_Html_Web Scraping_Beautifulsoup

Python+BeautifulSoup：如何获取'a'元素的'href'属性？

python html web-scraping

Python+BeautifulSoup：如何获取'a'元素的'href'属性？,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我有以下资料： html = '''<div class=“file-one”> <a href=“/file-one/additional” class=“file-link"> <h3 class=“file-name”>File One</h3> </a> <div class=“location”> Down </div> </di

我有以下资料：

  html =
  '''<div class=“file-one”>
    <a href=“/file-one/additional” class=“file-link">
      <h3 class=“file-name”>File One</h3>
    </a>
    <div class=“location”>
      Down
    </div>
  </div>'''

但它只是打印一张空白，什么也没有。只需链接：。因此，我在另一个网站上测试了它，但使用了不同的HTML，它成功了

我可能做错了什么？或者网站是否有可能故意编程不返回href

提前感谢您，我们一定会投票/接受答案

首先，使用不同的文本编辑器，不使用卷曲引号

其次，从soup.find_all中删除text=True标志

html中的“a”标记没有任何直接文本，但它包含一个包含文本的“h3”标记。这意味着文本为无，并且.find_all无法选择标记。如果标记包含除文本内容以外的任何其他html元素，通常不要使用text参数

如果仅使用标记的名称和href关键字参数来选择元素，则可以解决此问题。然后在循环中添加一个条件以检查它们是否包含文本

soup = BeautifulSoup(html, 'html.parser')
links_with_text = []
for a in soup.find_all('a', href=True): 
    if a.text: 
        links_with_text.append(a['href'])

或者你可以使用列表理解，如果你喜欢一行的话

links_with_text = [a['href'] for a in soup.find_all('a', href=True) if a.text]

或者你可以传递一个给。找到所有人

如果您想收集所有链接，无论它们是否有文本，只需选择所有具有“href”属性的“a”标记即可。锚定标记通常有链接，但这不是一个要求，所以我认为最好使用href参数

使用。查找所有

使用.select和CSS选择器

links = [a['href'] for a in soup.select('a[href]')]

您还可以使用attrs通过regex search获取href标记

soup.find('a', href = re.compile(r'[/]([a-z]|[A-Z])\w+')).attrs['href']

只需几行代码即可解决此问题：

来自gazpacho的进口汤 html=\ 向下汤 soup.finda，{class:file link}.attrs['href'] 这将产生：

“/文件一/附加”

你的HTML中真的有卷曲引号吗？因此，为什么你的代码中有卷曲引号呢？你在用什么编码？您需要使用文本编辑器。如果删除参数text=True，您的代码适用于我如果您需要有关引号的更多信息，请参阅本文：@downshift text=True有什么作用？我想它会以文本形式返回，我想告诉你们一个我自己很难理解的问题。如果你试一试，我会很高兴的。谢谢。你知道为什么直接打电话给.href不起作用，但是.attrs['href']很好用吗？我刚刚花了15分钟调试这个：

links = [a['href'] for a in soup.find_all('a', href=True)]

links = [a['href'] for a in soup.select('a[href]')]

soup.find('a', href = re.compile(r'[/]([a-z]|[A-Z])\w+')).attrs['href']