Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Selenium(Python 3)单击后获取数据_Python_Python 3.x_Selenium_Web Scraping_Beautifulsoup - Fatal编程技术网

使用Selenium(Python 3)单击后获取数据

使用Selenium(Python 3)单击后获取数据,python,python-3.x,selenium,web-scraping,beautifulsoup,Python,Python 3.x,Selenium,Web Scraping,Beautifulsoup,我正试图用简单的信息拼凑一页 我正在使用BeautifulSoup来获取数据。但是在页面中有一个按钮隐藏电子邮件信息。所以我试着用硒鸡,然后用BeautifulSoup我搜集数据。但我真的不知道怎么做 我做到了: import requests, time, re from bs4 import BeautifulSoup from selenium import webdriver url = "https://acukwik.com/Basic-Info/UUBP/RUSAERO

我正试图用简单的信息拼凑一页

我正在使用BeautifulSoup来获取数据。但是在页面中有一个按钮隐藏电子邮件信息。所以我试着用硒鸡,然后用BeautifulSoup我搜集数据。但我真的不知道怎么做

我做到了:

import requests, time, re
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://acukwik.com/Basic-Info/UUBP/RUSAERO"

driver = webdriver.Chrome()
driver.get(url)

while True:
    soup = BeautifulSoup(driver.page_source, 'html5lib')
    divEmail = soup.find('div', text=re.compile('Email'))
    try:
        driver.find_elements_by_class_name('ghEmail').click()
        time.sleep(3)
        email = divEmail.findNext('a')['href']
        print(email)
    except:
        break

driver.quit()
发生的情况是,一个Chrome页面在给定的url中打开,但什么也没有发生。它只是打开和关闭。我看不出按钮有变化


我做错了什么?如何获取这些数据?我可以用BeautifulSoup吗?

尝试以下代码以获取电子邮件:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...
driver.find_element_by_class_name('ghEmail').click()
email = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//a[starts-with(@href, "mailto:")]'))).text

您正在使用返回列表的find_元素,并且您正在尝试单击一个列表,因此它总是失败,因此您可以将find_元素按类更改为find_元素按类

您还可以使用定位器://div[contains(text(),'Email')]/../div[2]/a

单击后获取电子邮件的步骤

from selenium import webdriver
import time

from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("window-size=1920,1080")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")

url = "https://acukwik.com/Basic-Info/UUBP/RUSAERO"

driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
divEmail = driver.find_element_by_class_name('ghEmail').click()

time.sleep(1)

email = driver.find_element_by_xpath(
    "//div[contains(text(),'Email')]/../div[2]/a")
print(email.text)
driver.quit()

在点击
显示邮件
按钮之前,您正在喝汤。您需要先单击按钮,然后获取页面源代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time, re
from bs4 import BeautifulSoup

driver=webdriver.Chrome()
driver.get("https://acukwik.com/Basic-Info/UUBP/RUSAERO")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.ghEmail"))).click()
time.sleep(1)
soup = BeautifulSoup(driver.page_source, 'html5lib')
divEmail = soup.find('div', text=re.compile('Email'))
email = divEmail.findNext('a')['href']
print(email)

非常感谢。这正是我需要的!一个小问题:我如何才能更改为不弹出打开chrome页面?使用选项。添加参数headlessAdd headless也到代码您是否尝试过此get调用?