使用python3网页抓取功能仅打印网页的一个链接_Python_Python 3.x_Web Scraping

使用python3网页抓取功能仅打印网页的一个链接

python python-3.x web-scraping

使用python3网页抓取功能仅打印网页的一个链接,python,python-3.x,web-scraping,Python,Python 3.x,Web Scraping,我正试图打印以下网站http://www.infobolsa.es/news但是当我运行代码时，我总是得到相同的输出，正确的标题文本，但每个链接都是相同的。这是链接代码的一部分，谢谢： from urllib.request import urlopen html_page = urlopen("http://www.infobolsa.es/news") soup = BeautifulSoup(html_page, 'lxml') links = list() for titleM in b

我正试图打印以下网站

http://www.infobolsa.es/news

但是当我运行代码时，我总是得到相同的输出，正确的标题文本，但每个链接都是相同的。这是链接代码的一部分，谢谢：

from urllib.request import urlopen
html_page = urlopen("http://www.infobolsa.es/news")
soup = BeautifulSoup(html_page, 'lxml')
links = list()
for titleM in bodyDictWeb2:
    for link in soup.findAll('a', attrs={'href': re.compile("^/news/detail")}):
        print(link)
        bodyDictWeb2[titleM] = link.get('href')
        break


for k,v in bodyDictWeb2.items():
    print(k,":",v)

我已经解决了，下面是代码：

from urllib.request import urlopen
html_page = urlopen("http://www.infobolsa.es/news")
soup = BeautifulSoup(html_page, 'lxml')
links = list()
for titleM in bodyDictWeb2:
    for link in soup.findAll('a', attrs={'href': re.compile("^/news/detail")}):
        print(link.text , link.get('href'))
    break

为什么在第二个for循环中有一个

中断

？它将基本上得到第一个，然后爆发。什么是

bodyDictWeb2

？您可以使用此xpath

//文章/a[@class=“title”]

获取所有标题。bodyDictWeb2是什么？你只需要标题的文本，例如：“EL IBEX 35 FIRMA SU PEOR ENERO DESDE 2016 POR EL CORONAVIRUS DE CHINA”？@NavidZarepak我打断了，因此每个标题只有一个链接。bodyDictWeb是一本包含所有标题的词典。