Python 在麦克马斯特卡尔网站上抓取数据时出现问题
我正在为麦克马斯特卡尔写一个爬虫。例如,页面,如果我在浏览器中直接打开页面,我可以查看所有产品数据 因为数据是动态加载的内容,所以我使用Selenium+bs4Python 在麦克马斯特卡尔网站上抓取数据时出现问题,python,selenium,web-scraping,Python,Selenium,Web Scraping,我正在为麦克马斯特卡尔写一个爬虫。例如,页面,如果我在浏览器中直接打开页面,我可以查看所有产品数据 因为数据是动态加载的内容,所以我使用Selenium+bs4 if __name__ == "__main__": url = "https://www.mcmaster.com/98173A200" options = webdriver.ChromeOptions() options.add_argument("--enable-javascript") dri
if __name__ == "__main__":
url = "https://www.mcmaster.com/98173A200"
options = webdriver.ChromeOptions()
options.add_argument("--enable-javascript")
driver = webdriver.Chrome("C:/chromedriver/chromedriver.exe", options=options)
driver.set_page_load_timeout(20)
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
delay = 20
try:
email_input = WebDriverWait(driver, delay).until(
EC.presence_of_element_located((By.ID, 'MainContent')))
except TimeoutException:
print("Timeout loading DOM!")
print(soup)
然而,如果我运行代码,我会得到一个,如果我像我提到的那样直接在浏览器中打开这个页面,我不会得到
我还尝试使用下面的代码登录
try:
email_input = WebDriverWait(driver, delay).until(
EC.presence_of_element_located((By.ID, 'Email')))
print("Page is ready!!")
input("Press Enter to continue...")
except TimeoutException:
print("Loading took too much time!")
email_input.send_keys(email)
password_input = driver.find_element_by_id('Password')
password_input.send_keys(password)
login_button = driver.find_element_by_class_name("FormButton_primaryButton__1kNXY")
login_button.click()
from random_user_agent.user_agent import UserAgent
from random_user_agent.params import SoftwareName, OperatingSystem
software_names = [SoftwareName.CHROME.value]
operating_systems = [OperatingSystem.WINDOWS.value, OperatingSystem.LINUX.value]
user_agent_rotator = UserAgent(software_names=software_names,
operating_systems=operating_systems,
limit=100)
user_agent = user_agent_rotator.get_random_user_agent()
chrome_options = Options()
chrome_options.add_argument('user-agent=' + user_agent)
然后它就显现出来了
我将Selenium打开的页面中请求的标题与浏览器中的页面进行了比较,没有发现任何错误。我还尝试了其他网络驱动程序,比如PhantomJS和FireFox,我得到了同样的结果 我还尝试使用下面的代码使用随机用户代理
try:
email_input = WebDriverWait(driver, delay).until(
EC.presence_of_element_located((By.ID, 'Email')))
print("Page is ready!!")
input("Press Enter to continue...")
except TimeoutException:
print("Loading took too much time!")
email_input.send_keys(email)
password_input = driver.find_element_by_id('Password')
password_input.send_keys(password)
login_button = driver.find_element_by_class_name("FormButton_primaryButton__1kNXY")
login_button.click()
from random_user_agent.user_agent import UserAgent
from random_user_agent.params import SoftwareName, OperatingSystem
software_names = [SoftwareName.CHROME.value]
operating_systems = [OperatingSystem.WINDOWS.value, OperatingSystem.LINUX.value]
user_agent_rotator = UserAgent(software_names=software_names,
operating_systems=operating_systems,
limit=100)
user_agent = user_agent_rotator.get_random_user_agent()
chrome_options = Options()
chrome_options.add_argument('user-agent=' + user_agent)
还是一样的结果
Selenium打开的页面中的开发者工具显示有一堆。我想令牌授权是解决这个问题的关键,但我不知道该怎么处理它
任何帮助都将不胜感激