Python 为什么我只能在易趣上抓取前4页的搜索结果？_Python_Html_Web Scraping_Beautifulsoup_Python Requests

Python 为什么我只能在易趣上抓取前4页的搜索结果？

python html web-scraping

Python 为什么我只能在易趣上抓取前4页的搜索结果？,python,html,web-scraping,beautifulsoup,python-requests,Python,Html,Web Scraping,Beautifulsoup,Python Requests,我有一个简单的脚本来分析易趣（棒球交易卡）上的销售数据。在前4页，它似乎工作正常，但在第5页，它不再加载所需的html内容，我无法理解为什么会发生这种情况： #Import statements import requests import time from bs4 import BeautifulSoup as soup from tqdm import tqdm 但是，当我试图刮去第五页或更多页时，会出现以下情况： Page_5="https://www.ebay.com/sc

我有一个简单的脚本来分析易趣（棒球交易卡）上的销售数据。在前4页，它似乎工作正常，但在第5页，它不再加载所需的html内容，我无法理解为什么会发生这种情况：

#Import statements
import requests
import time
from bs4 import BeautifulSoup as soup
from tqdm import tqdm

但是，当我试图刮去第五页或更多页时，会出现以下情况：

Page_5="https://www.ebay.com/sch/213/i.html?_from=R40&LH_Sold=1&_sop=16&_pgn=5"

source=requests.get(Page_5)
time.sleep(5)
eBay_full = soup(source.text, "lxml")
Complete_container=eBay_full.find("ul",{"class":"b-list__items_nofooter"})
Single_item=Complete_container.find_all("div",{"class":"s-item__wrapper clearfix"})
items=[]
#For all items on page perform desired operation
for i in tqdm(Single_item):
    items.append(i.find("a", {"class": "s-item__link"})["href"].split('?')[0].split('/')[-1])

----> 5 Single_item=Complete_container.find_all("div",{"class":"s-item__wrapper clearfix"})
      6 items=[]
      7 #For all items on page perform desired operation

AttributeError: 'NoneType' object has no attribute 'find_all'

这似乎是后面几页易趣全汤中缺少ul b类物品的逻辑结果。然而，问题是为什么缺少这些信息？在汤中滚动，所有感兴趣的项目似乎都不存在。正如预期的那样，该信息会出现在网页上。谁能指导我？

根据@Sebastien D的评论，问题已经解决了

在headers变量中，仅放置其中一个浏览器，以及当前稳定的版本号（例如Chrome/53.0.2785.143，最新版本）

在使用<代码>请求< /代码>时，可以考虑使用<代码>标题>代码>参数。网站试图保护自己免受机器人攻击……这似乎解决了问题。谢谢

Page_5="https://www.ebay.com/sch/213/i.html?_from=R40&LH_Sold=1&_sop=16&_pgn=5"

source=requests.get(Page_5)
time.sleep(5)
eBay_full = soup(source.text, "lxml")
Complete_container=eBay_full.find("ul",{"class":"b-list__items_nofooter"})
Single_item=Complete_container.find_all("div",{"class":"s-item__wrapper clearfix"})
items=[]
#For all items on page perform desired operation
for i in tqdm(Single_item):
    items.append(i.find("a", {"class": "s-item__link"})["href"].split('?')[0].split('/')[-1])

----> 5 Single_item=Complete_container.find_all("div",{"class":"s-item__wrapper clearfix"})
      6 items=[]
      7 #For all items on page perform desired operation

AttributeError: 'NoneType' object has no attribute 'find_all'

headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'}

source= requests.get(Page_5, headers=headers, timeout=2)