Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用BeautifulSoup在Python中循环执行HREF_Python_Python 3.x_Beautifulsoup - Fatal编程技术网

使用BeautifulSoup在Python中循环执行HREF

使用BeautifulSoup在Python中循环执行HREF,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,所以我试图从这个链接中提取所有的文章URL 然而,我只得到['https://mn.usembassy.gov/mn/2020-naadam-mn/', 'https://mn.usembassy.gov/mn/06272020-presidential-proclamation-mn/', 'https://mn.usembassy.gov/mn/pr-060320-mn/', 'https://mn.usembassy.gov/mn/dv-2021-status-check-mn/', 'h

所以我试图从这个链接中提取所有的文章URL

然而,我只得到
['https://mn.usembassy.gov/mn/2020-naadam-mn/', 'https://mn.usembassy.gov/mn/06272020-presidential-proclamation-mn/', 'https://mn.usembassy.gov/mn/pr-060320-mn/', 'https://mn.usembassy.gov/mn/dv-2021-status-check-mn/', 'https://mn.usembassy.gov/mn/pr-050120-mn/']

下面是我目前的代码

该网站有52页,我试图得到所有的网址,为什么它只给我几个网址,而不是所有的网址

import requests
from bs4 import BeautifulSoup
url = 'https://mn.usembassy.gov/mn/news-events-mn/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'lxml')

urls = []
for h in soup.find_all('h2'):
    a = h.find('a')
    urls.append(a.attrs['href'])
print(urls)

该页面仅包含5个文章链接,您需要转到下一页以加载下5个链接。此脚本将从页面获取所有链接:

import requests
from bs4 import BeautifulSoup


url = 'https://mn.usembassy.gov/mn/news-events-mn/page/{page}/'

urls = []
for page in range(1, 53):
    soup = BeautifulSoup(requests.get(url.format(page=page)).content, 'html.parser')
    for h in soup.find_all('h2'):
        a = h.find('a')
        print(a['href'])
        urls.append(a.attrs['href'])


from pprint import pprint
pprint(urls)
印刷品:

https://mn.usembassy.gov/mn/2020-naadam-mn/
https://mn.usembassy.gov/mn/06272020-presidential-proclamation-mn/
https://mn.usembassy.gov/mn/pr-060320-mn/
https://mn.usembassy.gov/mn/dv-2021-status-check-mn/
https://mn.usembassy.gov/mn/pr-050120-mn/
https://mn.usembassy.gov/mn/pr-042320-mca-website-mn/
https://mn.usembassy.gov/mn/2020-pr-us-mongolia-cpc-mn/
https://mn.usembassy.gov/mn/lead-2020-in-country-mn/
https://mn.usembassy.gov/mn/press-release-usaid-mar-24-2020-mn/
https://mn.usembassy.gov/mn/event-suspension-of-nonimmigrant-and-immigrant-visa-services-due-to-local-covid-19-related-preventative-measures-and-limited-staffing-mn/
https://mn.usembassy.gov/mn/2020-best-program-pr-mn/
https://mn.usembassy.gov/mn/2020-ncov-info-for-visa-mn/

...and so on.

谢谢你的回答!但是我不确定这个页面是如何在网站上爬行的,最后一篇文章应该是,但是我得到了@YuriBurkov,我更新了我的代码。服务器在某些页面上返回错误。