Python 在麦克马斯特卡尔网站上抓取数据时出现问题

Python 在麦克马斯特卡尔网站上抓取数据时出现问题,python,selenium,web-scraping,Python,Selenium,Web Scraping,我正在为麦克马斯特卡尔写一个爬虫。例如,页面,如果我在浏览器中直接打开页面,我可以查看所有产品数据 因为数据是动态加载的内容,所以我使用Selenium+bs4 if __name__ == "__main__": url = "https://www.mcmaster.com/98173A200" options = webdriver.ChromeOptions() options.add_argument("--enable-javascript") dri

我正在为麦克马斯特卡尔写一个爬虫。例如,页面,如果我在浏览器中直接打开页面,我可以查看所有产品数据

因为数据是动态加载的内容,所以我使用Selenium+bs4

if __name__ == "__main__":
    url = "https://www.mcmaster.com/98173A200"
    options = webdriver.ChromeOptions()
    options.add_argument("--enable-javascript")
    driver = webdriver.Chrome("C:/chromedriver/chromedriver.exe", options=options)
    driver.set_page_load_timeout(20)
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, "html.parser")
    delay = 20
    try:
        email_input = WebDriverWait(driver, delay).until(
            EC.presence_of_element_located((By.ID, 'MainContent')))
    except TimeoutException:
        print("Timeout loading DOM!")
    print(soup)
然而,如果我运行代码,我会得到一个,如果我像我提到的那样直接在浏览器中打开这个页面,我不会得到


我还尝试使用下面的代码登录

   try:
        email_input = WebDriverWait(driver, delay).until(
            EC.presence_of_element_located((By.ID, 'Email')))
        print("Page is ready!!")
        input("Press Enter to continue...")
    except TimeoutException:
        print("Loading took too much time!")

    email_input.send_keys(email)
    password_input = driver.find_element_by_id('Password')
    password_input.send_keys(password)
    login_button = driver.find_element_by_class_name("FormButton_primaryButton__1kNXY")
    login_button.click()
from random_user_agent.user_agent import  UserAgent
from random_user_agent.params import SoftwareName, OperatingSystem

software_names = [SoftwareName.CHROME.value]
operating_systems = [OperatingSystem.WINDOWS.value, OperatingSystem.LINUX.value]

user_agent_rotator = UserAgent(software_names=software_names,
                               operating_systems=operating_systems,
                               limit=100)

user_agent = user_agent_rotator.get_random_user_agent()

chrome_options = Options()
chrome_options.add_argument('user-agent=' + user_agent)
然后它就显现出来了


我将Selenium打开的页面中请求的标题与浏览器中的页面进行了比较,没有发现任何错误。我还尝试了其他网络驱动程序,比如PhantomJS和FireFox,我得到了同样的结果

我还尝试使用下面的代码使用随机用户代理

   try:
        email_input = WebDriverWait(driver, delay).until(
            EC.presence_of_element_located((By.ID, 'Email')))
        print("Page is ready!!")
        input("Press Enter to continue...")
    except TimeoutException:
        print("Loading took too much time!")

    email_input.send_keys(email)
    password_input = driver.find_element_by_id('Password')
    password_input.send_keys(password)
    login_button = driver.find_element_by_class_name("FormButton_primaryButton__1kNXY")
    login_button.click()
from random_user_agent.user_agent import  UserAgent
from random_user_agent.params import SoftwareName, OperatingSystem

software_names = [SoftwareName.CHROME.value]
operating_systems = [OperatingSystem.WINDOWS.value, OperatingSystem.LINUX.value]

user_agent_rotator = UserAgent(software_names=software_names,
                               operating_systems=operating_systems,
                               limit=100)

user_agent = user_agent_rotator.get_random_user_agent()

chrome_options = Options()
chrome_options.add_argument('user-agent=' + user_agent)
还是一样的结果


Selenium打开的页面中的开发者工具显示有一堆。我想令牌授权是解决这个问题的关键,但我不知道该怎么处理它


任何帮助都将不胜感激

  • 您看到登录窗口的原因是您通过chrome驱动程序访问McMaster carr。当服务器识别您的行为时,它将要求您登录
  • 如果您未经麦克马斯特认证(需要签署保密协议),典型的登录将不起作用
  • 你应该看看麦克马斯特API。通过API,您可以直接访问数据库。但是,在获得API访问权之前,您需要与麦克马斯特卡尔签署保密协议