使用Selenium（Python 3）单击后获取数据_Python_Python 3.x_Selenium_Web Scraping_Beautifulsoup

使用Selenium（Python 3）单击后获取数据

python python-3.x selenium web-scraping

使用Selenium（Python 3）单击后获取数据,python,python-3.x,selenium,web-scraping,beautifulsoup,Python,Python 3.x,Selenium,Web Scraping,Beautifulsoup,我正试图用简单的信息拼凑一页我正在使用BeautifulSoup来获取数据。但是在页面中有一个按钮隐藏电子邮件信息。所以我试着用硒鸡，然后用BeautifulSoup我搜集数据。但我真的不知道怎么做我做到了： import requests, time, re from bs4 import BeautifulSoup from selenium import webdriver url = "https://acukwik.com/Basic-Info/UUBP/RUSAERO

我正试图用简单的信息拼凑一页

我正在使用BeautifulSoup来获取数据。但是在页面中有一个按钮隐藏电子邮件信息。所以我试着用硒鸡，然后用BeautifulSoup我搜集数据。但我真的不知道怎么做

我做到了：

import requests, time, re
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://acukwik.com/Basic-Info/UUBP/RUSAERO"

driver = webdriver.Chrome()
driver.get(url)

while True:
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    divEmail = soup.find('div', text=re.compile('Email'))
    try:
        driver.find_elements_by_class_name('ghEmail').click()
        time.sleep(3)
        email = divEmail.findNext('a')['href']
        print(email)
    except:
        break

driver.quit()

发生的情况是，一个Chrome页面在给定的url中打开，但什么也没有发生。它只是打开和关闭。我看不出按钮有变化

我做错了什么？如何获取这些数据？我可以用BeautifulSoup吗？

尝试以下代码以获取电子邮件：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...
driver.find_element_by_class_name('ghEmail').click()
email = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//a[starts-with(@href, "mailto:")]'))).text

您正在使用返回列表的find_元素，并且您正在尝试单击一个列表，因此它总是失败，因此您可以将find_元素按类更改为find_元素按类

您还可以使用定位器：//div[contains（text（），'Email'）]/../div[2]/a

单击后获取电子邮件的步骤

from selenium import webdriver
import time

from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("window-size=1920,1080")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")

url = "https://acukwik.com/Basic-Info/UUBP/RUSAERO"

driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
divEmail = driver.find_element_by_class_name('ghEmail').click()

time.sleep(1)

email = driver.find_element_by_xpath(
    "//div[contains(text(),'Email')]/../div[2]/a")
print(email.text)
driver.quit()

在点击

显示邮件

按钮之前，您正在喝汤。您需要先单击按钮，然后获取页面源代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time, re
from bs4 import BeautifulSoup

driver=webdriver.Chrome()
driver.get("https://acukwik.com/Basic-Info/UUBP/RUSAERO")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.ghEmail"))).click()
time.sleep(1)
soup = BeautifulSoup(driver.page_source, 'html5lib')
divEmail = soup.find('div', text=re.compile('Email'))
email = divEmail.findNext('a')['href']
print(email)

非常感谢。这正是我需要的！一个小问题：我如何才能更改为不弹出打开chrome页面？使用选项。添加参数headlessAdd headless也到代码您是否尝试过此get调用？