Python 3.x 点击网站上的链接，获取含硒泡泡中的内容_Python 3.x_Selenium

Python 3.x 点击网站上的链接，获取含硒泡泡中的内容

python-3.x selenium

Python 3.x 点击网站上的链接，获取含硒泡泡中的内容,python-3.x,selenium,Python 3.x,Selenium,我正在努力获取课程信息在我的代码中，我尝试先单击每个课程，然后在气泡中获取描述，然后关闭气泡，因为它可能覆盖在其他课程链接的顶部我的问题是，我无法获得气泡中的描述，尽管我试图通过关闭气泡来避免它，但仍然跳过了一些课程链接你知道怎么做吗？提前谢谢 info = [] driver = webdriver.Chrome() driver.get('http://bulletin.iit.edu/graduate/colleges/science/applied-mathematics/mast

我正在努力获取课程信息

在我的代码中，我尝试先单击每个课程，然后在气泡中获取描述，然后关闭气泡，因为它可能覆盖在其他课程链接的顶部

我的问题是，我无法获得气泡中的描述，尽管我试图通过关闭气泡来避免它，但仍然跳过了一些课程链接

你知道怎么做吗？提前谢谢

info = []
driver = webdriver.Chrome()
driver.get('http://bulletin.iit.edu/graduate/colleges/science/applied-mathematics/master-data-science/#programrequirementstext')
for i in range(1,3):
    for j in range(2, 46):
        try: 
            driver.find_element_by_xpath('//*[@id="programrequirementstextcontainer"]/table['+str(i)+']/tbody/tr['+str(j)+']/td[1]/a').click()
            info.append(driver.find_elements_by_xpath('/html/body/div[8]/div[3]/div/div')[0].text)
            driver.find_element_by_xpath('//*[@id="lfjsbubbleclose"]').click()
            time.sleep(3)
        except: pass 


  [1]: http://bulletin.iit.edu/graduate/colleges/science/applied-mathematics/master-data-science/#programrequirementstext

要加载bubble，网站会调用ajax

import  requests
from bs4 import BeautifulSoup

def course(course_code):
    data = {"page":"getcourse.rjs","code":course_code}

    res = requests.get("http://bulletin.iit.edu/ribbit/index.cgi", data=data)

    soup = BeautifulSoup(res.text,"lxml")

    result = {}
    result["description"] = soup.find("div", class_="courseblockdesc").text.strip()
    result["title"] = soup.find("div", class_="coursetitle").text.strip()
    return result

课程的输出（“CS 522”）

要加载bubble，网站会调用ajax

import  requests
from bs4 import BeautifulSoup

def course(course_code):
    data = {"page":"getcourse.rjs","code":course_code}

    res = requests.get("http://bulletin.iit.edu/ribbit/index.cgi", data=data)

    soup = BeautifulSoup(res.text,"lxml")

    result = {}
    result["description"] = soup.find("div", class_="courseblockdesc").text.strip()
    result["title"] = soup.find("div", class_="coursetitle").text.strip()
    return result

课程的输出（“CS 522”）

不知道为什么要将静态范围放在for循环中，即使xpath中的i和j索引计数的所有组合在应用程序中都找不到任何元素

我建议最好使用单个定位器和循环槽来查找网页上的所有元素，以从bubble中获取描述

使用以下代码：

course_list = driver.find_elements_by_css_selector("table.sc_courselist a.bubblelink.code")
wait = WebDriverWait(driver, 20)
for course in course_list:
    try:
        print("grabbing info of course : ", course.text)
        course.click()
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.courseblockdesc")))
        info.append(driver.find_element_by_css_selector('div.courseblockdesc>p').text)
        wait.until(EC.visibility_of_element_located((By.ID, "lfjsbubbleclose")))
        driver.find_element_by_id('lfjsbubbleclose').click()
    except:
        print("error while grabbing info")

print(info)

因为在bubble中加载内容需要一些时间，所以应该在脚本中引入显式等待，直到bubble内容完全可见，然后抓取它

导入下面的包以在上面的代码中使用wait：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

请注意，此代码从bubble中获取所有课程描述。如果您正在寻找某个特定的而不是全部，请告诉我。

不确定为什么要将静态范围放入for循环中，即使xpath中的i和j索引计数的所有组合在应用程序中都找不到任何元素

我建议最好使用单个定位器和循环槽来查找网页上的所有元素，以从bubble中获取描述

使用以下代码：

course_list = driver.find_elements_by_css_selector("table.sc_courselist a.bubblelink.code")
wait = WebDriverWait(driver, 20)
for course in course_list:
    try:
        print("grabbing info of course : ", course.text)
        course.click()
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.courseblockdesc")))
        info.append(driver.find_element_by_css_selector('div.courseblockdesc>p').text)
        wait.until(EC.visibility_of_element_located((By.ID, "lfjsbubbleclose")))
        driver.find_element_by_id('lfjsbubbleclose').click()
    except:
        print("error while grabbing info")

print(info)

因为在bubble中加载内容需要一些时间，所以应该在脚本中引入显式等待，直到bubble内容完全可见，然后抓取它

导入下面的包以在上面的代码中使用wait：

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

请注意，此代码从bubble中获取所有课程描述。如果您正在寻找某个特定的而不是全部的元素，请告诉我。

为什么您将范围46放入范围（2，46）中的j的

：

即使页面上没有46元素？最好使用组合创建一次xpath，并获取所有这些元素并循环它们以提取描述为什么将范围46放入范围（2，46）中的j:，即使页面上没有46元素？最好使用组合生成一次xpath，并获取所有这些内容并循环它们以提取描述