Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么Selenium只获取页面上第一个工具提示的文本?_Python_Selenium_Web Scraping_Action_Webdriverwait - Fatal编程技术网

Python 为什么Selenium只获取页面上第一个工具提示的文本?

Python 为什么Selenium只获取页面上第一个工具提示的文本?,python,selenium,web-scraping,action,webdriverwait,Python,Selenium,Web Scraping,Action,Webdriverwait,作为使用Python、Selenium和BeautifulSoup构建的大型webscraper的一部分,我正在尝试获取此页面上所有工具提示的文本: 我当前的代码成功地获取了所有链接并将鼠标移到每个链接上——当我运行它时,我看到每个工具提示相继弹出。但是,它只输出第一个工具提示的文本。我不知道为什么!我想我可能需要更长的鼠标切换等待时间,但上升到20秒,这并没有解决问题 代码如下: bill_links = soup.find_all('a', {'id': re.compile('Bill'

作为使用Python、Selenium和BeautifulSoup构建的大型webscraper的一部分,我正在尝试获取此页面上所有工具提示的文本:

我当前的代码成功地获取了所有链接并将鼠标移到每个链接上——当我运行它时,我看到每个工具提示相继弹出。但是,它只输出第一个工具提示的文本。我不知道为什么!我想我可能需要更长的鼠标切换等待时间,但上升到20秒,这并没有解决问题

代码如下:

 bill_links = soup.find_all('a', {'id': re.compile('Bill')})
 summaries = []
 bill_numbers = [link.text.strip() for link in bill_links]

 for link in bill_links:
   billid = link.get('id')
   action = ActionChains(driver)
   action.move_to_element(driver.find_element_by_id(billid)).perform()
   time.sleep(5)
   summary = driver.find_element_by_class_name("ToolTip-BillSummary-ShortTitle").text
   print(summary)
   summaries = summaries + [summary]
   action.reset_actions()
同样,第一个print(summary)命令成功地返回了第一个工具提示的文本(“修订1968年1月17日法案的法案…”)——但随后的每个print(summary)命令只返回一个空白


我对编程非常陌生,如果有明显的答案,我深表歉意。

问题可能是由于您的这行代码造成的:

summary = driver.find_element_by_class_name("ToolTip-BillSummary-ShortTitle").text
查找相应元素的条件仅受该元素的类名的限制,此条件可能会提供元素列表,但实际上您没有指定要获取文本的元素

要解决此问题,请改用xpath表达式(需要使用索引变量来定位元素):

summary=driver。通过xpath(“//*[@id=“qtip-”++“-content”]/div/div[3]”查找元素。text

tl;医生:

import requests
from bs4 import BeautifulSoup as bs
import re

def add_bill_summary_tooltip(s, session_year, session_ind, bill_body, bill_type, bill_no):
    url = g_server_url + '/cfdocs/cfc/GenAsm.cfc?returnformat=plain'
    data = { 'method' : 'GetBillSummaryTooltip',
            'SessionYear' : session_year,
            'SessionInd' : session_ind,
            'BillBody' : bill_body,
            'BillType' : bill_type,
            'BillNo' : bill_no,
            'IsAjaxRequest' : '1'
            }

    r = s.get(url, params = data)
    soup = bs(r.content, 'lxml')
    tooltip = soup.select_one('.ToolTip-BillSummary-ShortTitle')
    if tooltip is not None:
        tooltip = tooltip.text.strip()
    return tooltip

g_server_url = "https://www.legis.state.pa.us"

#add_bill_summary_tooltip('#Bill_1',2019,0,'S','B','0012')
with requests.Session() as s:
    r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
    soup = bs(r.content, 'lxml')
    tooltips = {item.select_one('a').text:item.select_one('script').text[:-1] for item in soup.select('.DataTable td:has(a)')}
    p = re.compile(r"'(.*?)',(.*),(.*),'(.*)','(.*)','(.*)'")
    for bill in tooltips:
        arg1,arg2,arg3,arg4,arg5,arg6 = p.findall(tooltips[bill])[0]
        tooltips[bill] = add_bill_summary_tooltip(s, arg2, arg3,arg4,arg5,arg6)

print(tooltips)
硒是不需要的。如果它是如图所示的工具提示(不是全文),那么您可以使用bs4并复制页面使用的javascript函数。函数调用的参数可在脚本标记中找到,该标记与每个账单列表的a标记相邻。我将这些从适当的字符串中正则化,以传递给我们的用户定义函数(它复制jquery函数)

您可以看到相关的调用
addBillSummary工具提示(“#Bill_1',2019,0,'S','B','0012”)


工具提示:

import requests
from bs4 import BeautifulSoup as bs
import re

def add_bill_summary_tooltip(s, session_year, session_ind, bill_body, bill_type, bill_no):
    url = g_server_url + '/cfdocs/cfc/GenAsm.cfc?returnformat=plain'
    data = { 'method' : 'GetBillSummaryTooltip',
            'SessionYear' : session_year,
            'SessionInd' : session_ind,
            'BillBody' : bill_body,
            'BillType' : bill_type,
            'BillNo' : bill_no,
            'IsAjaxRequest' : '1'
            }

    r = s.get(url, params = data)
    soup = bs(r.content, 'lxml')
    tooltip = soup.select_one('.ToolTip-BillSummary-ShortTitle')
    if tooltip is not None:
        tooltip = tooltip.text.strip()
    return tooltip

g_server_url = "https://www.legis.state.pa.us"

#add_bill_summary_tooltip('#Bill_1',2019,0,'S','B','0012')
with requests.Session() as s:
    r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
    soup = bs(r.content, 'lxml')
    tooltips = {item.select_one('a').text:item.select_one('script').text[:-1] for item in soup.select('.DataTable td:has(a)')}
    p = re.compile(r"'(.*?)',(.*),(.*),'(.*)','(.*)','(.*)'")
    for bill in tooltips:
        arg1,arg2,arg3,arg4,arg5,arg6 = p.findall(tooltips[bill])[0]
        tooltips[bill] = add_bill_summary_tooltip(s, arg2, arg3,arg4,arg5,arg6)

print(tooltips)

全文:

import requests
from bs4 import BeautifulSoup as bs
import re

def add_bill_summary_tooltip(s, session_year, session_ind, bill_body, bill_type, bill_no):
    url = g_server_url + '/cfdocs/cfc/GenAsm.cfc?returnformat=plain'
    data = { 'method' : 'GetBillSummaryTooltip',
            'SessionYear' : session_year,
            'SessionInd' : session_ind,
            'BillBody' : bill_body,
            'BillType' : bill_type,
            'BillNo' : bill_no,
            'IsAjaxRequest' : '1'
            }

    r = s.get(url, params = data)
    soup = bs(r.content, 'lxml')
    tooltip = soup.select_one('.ToolTip-BillSummary-ShortTitle')
    if tooltip is not None:
        tooltip = tooltip.text.strip()
    return tooltip

g_server_url = "https://www.legis.state.pa.us"

#add_bill_summary_tooltip('#Bill_1',2019,0,'S','B','0012')
with requests.Session() as s:
    r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
    soup = bs(r.content, 'lxml')
    tooltips = {item.select_one('a').text:item.select_one('script').text[:-1] for item in soup.select('.DataTable td:has(a)')}
    p = re.compile(r"'(.*?)',(.*),(.*),'(.*)','(.*)','(.*)'")
    for bill in tooltips:
        arg1,arg2,arg3,arg4,arg5,arg6 = p.findall(tooltips[bill])[0]
        tooltips[bill] = add_bill_summary_tooltip(s, arg2, arg3,arg4,arg5,arg6)

print(tooltips)
如果您想要全文,则可以从第一页获取指向全文页面的链接,然后访问循环中的每个页面并获取全文:

import requests
from bs4 import BeautifulSoup as bs

def add_bill_summary_full(s, url): 
    r = s.get(url)
    soup = bs(r.content, 'lxml')
    summary = soup.select_one('.BillInfo-Section-Data div')
    if summary is not None:
        summary = summary.text
    return summary

g_server_url = "https://www.legis.state.pa.us"

with requests.Session() as s:
    r = s.get('https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth')
    soup = bs(r.content, 'lxml')
    full_text = {item.text:g_server_url + item['href'] for item in soup.select('.DataTable a')}
    for k,v in full_text.items():
        full_text[k] = add_bill_summary_full(s, v)

print(full_text)

这是jquery使用的源代码javascript函数

函数addBillSummary工具提示(元素、SessionYear、SessionInd、BillBody、BillType、BillNo){
jQuery(元素).qtip({
内容:{
文本:函数(事件、api){
jQuery.ajax({
url:g_ServerURL+'/cfdocs/cfc/GenAsm.cfc?returnformat=plain',
数据:{
方法:“GetBillSummaryTooltip”,
SessionYear:SessionYear,
SessionInd:SessionInd,
BillBody:BillBody,
BillType:BillType,
比尔诺:比尔诺,
请求:1
}
})
如果您正在使用,则不必使用BeautifulSoup.来提取页面上所有工具提示的文本
https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44 | 0 | Katie%20J.%20Muth
您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.action_chains import ActionChains
    
    chrome_options = webdriver.ChromeOptions() 
    chrome_options.add_argument("start-maximized")
    chrome_options.add_argument('disable-infobars')
    driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.legis.state.pa.us/CFDocs/Legis/BS/bs_action.cfm?SessId=20190&Sponsors=S|44|0|Katie%20J.%20Muth")
    for elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='DataTable']/tbody//tr/td/a"))):
        senete_bill_shorten_number = elem.get_attribute("innerHTML").split()[1]
        ActionChains(driver).move_to_element(elem).perform()
        print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='ToolTip-BillSummary']/div[@class='ToolTip-BillSummary-Title' and contains(., '" + senete_bill_shorten_number + "')]//following::div[2]"))).get_attribute("innerHTML"))
    
  • 控制台输出:

                        An Act amending the act of January 17, 1968 (P.L.11, No.5), known as The Minimum Wage Act of 1968,  further providing for definitions and for minimum wages; providing for gratuities; further providing for enforcement and rules and regulations, for pe ...
    
    
    
                        An Act providing for mandatory Statewide employer-paid sick leave for employees and for civil penalties and remedies.
    
    
    
                        An Act amending Title 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in judicial boards and commissions, providing for adoption of guidelines for administrative probation violations; and, in sentencing, further provi ...
    
    
    
                        An Act amending the act of May 22, 1951 (P.L.317, No.69), known as The Professional Nursing Law,  further providing for title, for definitions, for State Board of Nursing, for dietitian-nutritionist license required, for unauthorized practices and ac ...
    
    
    
                        An Act amending the act of March 4, 1971 (P.L.6, No.2), known as the Tax Reform Code of 1971, providing for Pennsylvania Housing Tax Credit.
    
    
    
                        An Act amending the act of December 3, 1959 (P.L.1688, No.621), known as the Housing Finance Agency Law, in Pennsylvania Housing Affordability and Rehabilitation Enhancement Program, further providing for fund.
    
    
    
                        An Act amending the act of March 10, 1949 (P.L.30, No.14), known as the Public School Code of 1949, in charter schools, further providing for funding for charter schools.
    
    
    
                        An Act amending the act of June 13, 1967 (P.L.31, No.21), known as the Human Services Code,  in departmental powers and duties as to supervision, providing for lead testing in children's institutions; and, in departmental powers and duties as to lice ...
    
    
    
                        An Act providing for the protection of water supplies.
    
    
    
                        An Act amending Title 35 (Health and Safety) of the Pennsylvania Consolidated Statutes, providing for emergency addiction treatment; and imposing powers and duties on the Department of Drug and Alcohol Programs.
    
    
    
                        An Act amending Title 18 (Crimes and Offenses) of the Pennsylvania Consolidated Statutes, providing for transfer and sale of animals.
    
    
    
                        An Act amending Title 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in particular rights and immunities, providing for civil immunity of person rescuing minor from motor vehicle.
    
    
    
                        An Act providing for health care insurance coverage protections, for duties of the Insurance Department and the Insurance Commissioner, for regulations, for enforcement and for penalties.
    
    
    
                        An Act amending the act of May 17, 1921 (P.L.682, No.284), known as The Insurance Company Law of 1921, in casualty insurance, providing coverage for essential health benefits.
    
    
    
                        An Act amending the act of October 27, 1955 (P.L.744, No.222), known as the Pennsylvania Human Relations Act, further providing for definitions and for unlawful discriminatory practices.
    
    
    
                        An Act amending Titles 18 (Crimes and Offenses) and 42 (Judiciary and Judicial Procedure) of the Pennsylvania Consolidated Statutes, in human trafficking, further providing for the offense of trafficking in individuals and for the offense of patroniz ...
    
    
    
                        An Act amending Title 75 (Vehicles) of the Pennsylvania Consolidated Statutes, in registration of vehicles, further providing for veteran plates and placard.
    
    
    
                        An Act providing for health insurance coverage requirements for stage four, advanced metastatic cancer.
    
    
    
                        An Act authorizing the Commonwealth of Pennsylvania to join the Psychology Interjurisdictional Compact; providing for the form of the compact; imposing additional powers and duties on the Governor, the Secretary of the Commonwealth and the Compact.
    
    
    
                        An Act amending Titles 42 (Judiciary and Judicial Procedure) and 75 (Vehicles) of the Pennsylvania Consolidated Statutes, in sentencing, further providing for payment of court costs, restitution and fines, for fine and for failure to pay fine; in lic ...
    
    
    
                        An Act amending the act of January 17, 1968 (P.L.11, No.5), known as The Minimum Wage Act of 1968,  further providing for definitions and for rate of minimum wages; and providing for reporting by the Department of Labor and Industry.
    
    
    
                        An Act amending Title 23 (Domestic Relations) of the Pennsylvania Consolidated Statutes, in marriage license, further providing for restrictions on issuance of license.
    
    
    
                        An Act amending the act of March 4, 1971 (P.L.6, No.2), known as the Tax Reform Code of 1971, in sales and use tax, further providing for exclusions from tax.
    

谢谢你,德班詹!这个解决方案清晰、简单,效果极佳。我接受了你的回答。谢谢你,QHarr!你的解决方案非常彻底,效果很好。我最终使用了德班詹的解决方案(如下)由于其他原因,我已经在我的总体代码中使用了Selenium。感谢您解释如何获取全文——我以前使用过类似的代码,但是由于网站的robots.txt要求您在每次页面访问之间等待5秒,并且有100多个页面,所以这花费了太长的时间。非常欢迎您,谢谢您感谢您花时间进行反馈。非常感谢。我对所有页面(即获取完整摘要的链接)都运行了上面的内容,但是看起来还不错。那么您可能已经对该网站进行了很多改进?