Python 3.x 提取'时为空变量；a'；在Python中使用BeautifulSoup的标记_Python 3.x_Beautifulsoup_Html Parsing_List Comprehension

Python 3.x 提取'时为空变量；a'；在Python中使用BeautifulSoup的标记

python-3.x

Python 3.x 提取'时为空变量；a'；在Python中使用BeautifulSoup的标记,python-3.x,beautifulsoup,html-parsing,list-comprehension,Python 3.x,Beautifulsoup,Html Parsing,List Comprehension,我需要从转储中提取所有链接： import requests from bs4 import BeautifulSoup index = requests.get('https://dumps.wikimedia.org/backup-index.html').text soup_index = BeautifulSoup(index, 'html.parser') dumps = [a['href'] for a in soup_index.find_all('a') if

我需要从转储中提取所有链接：

import requests
from bs4 import BeautifulSoup

index = requests.get('https://dumps.wikimedia.org/backup-index.html').text
soup_index = BeautifulSoup(index, 'html.parser')
dumps = [a['href'] for a in soup_index.find_all('a')
        if a.has_attr('href') and a.text[:-1].isdigit()]

但是我得到一个空的转储变量

我做错了什么？

您可能正在寻找这样的东西：

targets = soup_index.find_all('a',href=True)
for target in targets:
    print(target,target.text)

输出：

<a href="angwikibooks/20200701">angwikibooks</a> angwikibooks
<a href="huwikisource/20200701">huwikisource</a> huwikisource
<a href="cswikiquote/20200701">cswikiquote</a> cswikiquote
<a href="cawikibooks/20200701">cawikibooks</a> cawikibooks

wikibooks
huwikisource
cswikiquote
维基百科

等等。

可能是因为没有以数字作为最后一个字符的

a.text

。@JackFleeting我如何处理它？下面是一个示例“是的，许多

节点都有文本，但文本不会以类似

的数字结尾。isdigit（）

需要。所以没有办法解决这个问题；如果需要，您可以删除条件并打印文本。另外，您的预期输出是什么？@JackFleeting我正在尝试提取链接