未使用Python在Selenium中完全加载选定的LinkedIn配置文件_Python_Python 3.x_Selenium

未使用Python在Selenium中完全加载选定的LinkedIn配置文件

python python-3.x selenium

未使用Python在Selenium中完全加载选定的LinkedIn配置文件,python,python-3.x,selenium,Python,Python 3.x,Selenium,我已经写了一段代码来获取LinkedIn配置文件的详细信息，但有时一些用户配置文件没有加载整个HTML 我已经使用了经典的等待机制，即 driver.implicitly_wait(10) time.sleep(10) element_present = EC.presence_of_element_located((By.CLASS_NAME, '.pv-profile-section__card-item-v2.pv-profile-section.pv-position-entity.

我已经写了一段代码来获取LinkedIn配置文件的详细信息，但有时一些用户配置文件没有加载整个HTML

我已经使用了经典的等待机制，即

driver.implicitly_wait(10)

time.sleep(10)

element_present = EC.presence_of_element_located((By.CLASS_NAME, '.pv-profile-section__card-item-v2.pv-profile-section.pv-position-entity.ember-view'))
WebDriverWait(driver, 300).until(element_present)

但它们似乎都不起作用

我的代码片段：

firstName = urllib.parse.quote(userFirstName)
lastName = urllib.parse.quote(userLastName)
company = urllib.parse.quote(userCompany)

driver.get('https://www.linkedin.com/search/results/people/?company='+company+'&firstName='+firstName+'&lastName='+lastName+'&origin=FACETED_SEARCH')

results = len(driver.find_elements_by_css_selector('.name.actor-name'))
for i in range(1):
    print(i)
    driver.find_elements_by_css_selector('.name.actor-name')[i].click()
    time.sleep(10)
    print(driver.current_url)

    content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
    driver.implicitly_wait(2)
    soup = BeautifulSoup(content, "html.parser")
    #print(soup)

    companyList = soup.findAll('section',{'class':'pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view'})
    print("Company list length: "+str(len(companyList)))

该代码确实为许多用户提供了公司列表，但在某些情况下它只是失败了。我在浏览器上检查了这些配置文件，代码中的元素确实存在

在此方面如有任何帮助/以往经验，将不胜感激。我知道解决这个问题也需要努力，所以提前谢谢

附言：HTML的一部分（我关心的体验部分）：




标题
产品设计师
雇用日期
2018年6月至今
就业期限
1年5个月
位置
孟买，马哈拉施特拉邦，印度

标题
用户界面/用户体验设计器
雇用日期
2017年5月至今
就业期限
2年6个月
位置
孟买，马哈拉施特拉邦，印度

我基本上需要使用的公司名称、角色和日期。

根据您发布的更新HTML，可能是

部分

元素已完全加载，但其内容未完全加载，这可能导致

公司列表

如您所述为空

我更愿意等待比

部分更具体的内容：
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Wait on ALL sections to load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//section[contains(@class, 'pv-profile-section')]")))

# Wait on Company Name labels to load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//*[contains(text(), 'Company Name')]")))

# Get company list
companyList = driver.find_elements_by_xpath("//section[contains(@class, 'pv-profile-section')]")

print(len(companyList))

此代码将等待加载所有节
元素，并等待加载公司名称
——这可能避免加载节，但其内容尚未完全加载的问题；对不起，我忘了把它包括在问题里。我将编辑该问题以包含HTML的一部分。我确实看到您在示例中指定了，但2秒可能不够长。我也不确定哪行代码抛出了错误，或者错误消息说了什么——包括这些信息也会有帮助。代码中没有错误。问题是“companyList”为0，这表示配置文件的工作经验部分未加载。我已经用HTML的一部分更新了这个问题。@makeshift programmer我已经用一个可能的解决方案更新了我的答案，包括一个测试，看看这里有多少家公司。让我知道这是否对您有效。加载所有部分的代码有效（不会引发错误），但等待公司名称加载的第二行进入超时。即使将限制提高到200秒。页面是否会无休止地加载？@pcalkins，是的，似乎是这样。有时设置此标志会有所帮助（ChromeOptions，此处的java代码）：options.addArguments（“--dns prefetch disable”）“PageLoadStrategy”也有一些新设置，这不是一个很好的解决方案，但它至少允许您的脚本继续：
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Wait on ALL sections to load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//section[contains(@class, 'pv-profile-section')]")))

# Wait on Company Name labels to load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//*[contains(text(), 'Company Name')]")))

# Get company list
companyList = driver.find_elements_by_xpath("//section[contains(@class, 'pv-profile-section')]")

print(len(companyList))