Python 如何单击链接、在新选项卡上打开链接、执行某些操作并返回到父窗口以进行selenium中的rest操作_Python_Selenium

Python 如何单击链接、在新选项卡上打开链接、执行某些操作并返回到父窗口以进行selenium中的rest操作

python selenium

Python 如何单击链接、在新选项卡上打开链接、执行某些操作并返回到父窗口以进行selenium中的rest操作,python,selenium,Python,Selenium,我在努力刮 "https://beta.sam.gov/search?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1”。首先，我试图找到所有有机会的细节在它的部门。( 我已经完成了）我从这个页面得到一些基本信息比如日期和机会。但我还需要一个完整的描述，为此，我必须点击它的标题，这将带我们到一个新页面。我必须从这个新页面上获取详细的描述，然后离开返回到父页面，以便抓取下一个机会我该怎么做？下面是我的代码，我尝试了

我在努力刮 "https://beta.sam.gov/search?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1”。首先，我试图找到所有有机会的细节在它的部门。( 我已经完成了）我从这个页面得到一些基本信息比如日期和机会。但我还需要一个完整的描述，为此，我必须点击它的标题，这将带我们到一个新页面。我必须从这个新页面上获取详细的描述，然后离开返回到父页面，以便抓取下一个机会

我该怎么做？下面是我的代码，我尝试了，但没有得到预期的结果

from selenium import webdriver
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(ChromeDriverManager().install())
driver = webdriver.Chrome( options=options ,executable_path="D:/chromedriver.exe")

baseLink = "https://beta.sam.gov"
lastPageNumber = 2
currentpage = 1

# getting source code of each parent page , like page = 1,2,3 until the last page.
def startscrapping(currentpage):
    driver.get(f'https://beta.sam.gov/search?keywords=&sort=-modifiedDate&index=opp&is_active=true&page={currentpage}')
    source = driver.page_source
    soup = BeautifulSoup(source, 'lxml')
    return soup

# getting the last index for the parent page to be scrapped
def findLastPageIndex():
    soup = startscrapping(1)
    results = soup.find("ul", {"aria-label" : "pagination"}).findAll('li',"ng-star-inserted")
    if results is not None:
        lastPageNumber= int(results[-2].text.strip())

findLastPageIndex()


while  currentpage <= lastPageNumber :
    soup = startscrapping(currentpage)
    outerDivs = soup.find_all(attrs={"tabindex": "-1" , "class": "ng-tns-c1-1 ng-star-inserted"})
    print(f"page number = {currentpage}")
    for index,item in enumerate(outerDivs):
        title = item.find("h3" , class_= "opportunity-title").text.strip()

        #....some other codes for finding title and dates 

        # trying to click on the link
        driver.find_element_by_xpath(f"//*[@id='search-results']/div[{index+1}]/opportunities-result/div/div/div[1]/h3/a").click()      
    
        handles = driver.window_handles
        size = len(handles)
        print(f"length of handles  = {size}")
        parent_handle = driver.current_window_handle

        for x in range(size):
            if handles[x] != parent_handle:
                # trying to swith on the new window
                driver.switch_to.window(handles[x])
                print(driver.title)
                driver.close()
                break
        
        driver.switch_to.window(parent_handle)

       
        break
        
    break

从selenium导入webdriver
从selenium导入webdriver
从webdriver_manager.chrome导入ChromeDriverManager
从selenium.webdriver.chrome.options导入选项
从bs4导入BeautifulSoup
选项=选项（）
options.headless=True
options.add_参数（“--windowsize=19201200”）
driver=webdriver.Chrome（ChromeDriverManager（）.install（））
driver=webdriver.Chrome（options=options，可执行文件\u path=“D:/chromedriver.exe”）
基本链接=”https://beta.sam.gov"
lastPageNumber=2
当前页面=1
#获取每个父页面的源代码，比如直到最后一页的page=1,2,3。
def STARTSTRAPPING（当前页面）：
司机，去（f'https://beta.sam.gov/search?keywords=&sort=-modifiedDate&index=opp&is_active=true&page={currentpage}'）
source=driver.page\u source
汤=BeautifulSoup（来源“lxml”）
返汤
#获取要报废的父页的最后一个索引
def findLastPageIndex（）：
汤=开始振打（1）
结果=soup.find（“ul”，“aria标签”：“pagination”}）.findAll（'li'，“插入ng星”）
如果结果不是无：
lastPageNumber=int（结果[-2].text.strip（））
findLastPageIndex（）
当currentpage时，不要单击链接，而是尝试删除链接并在新选项卡中打开链接。以下是你如何做到这一点：
source = driver.page_source
soup = BeautifulSoup(source, 'lxml')
a_tags = soup.find_all('a',class_ = "wordbreak")
url_lst = []
for a in a_tags:
   url_lst.append("https://beta.sam.gov/"+ a['href'])
print(url_lst)

输出：
['https://beta.sam.gov//opp/26d35e6fa6e64a8099cf37e592ea54d0/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/5a07648db5034fd590ce2d3526eea366/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/7fd9b43ec7734708b23f6cacb189bf7f/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/1a9d2d5d61ac4deba9d6cc973b684176/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/b7eb9430de724dd58bf9c4c76c0d8652/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/f70317667944475981fbcbfd52a6f86e/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/b39489e47c8a41c6801499de4d908dad/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/dcaa31871a174dd3a136b276dbf0040e/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/401b8fd7f2234ffcbf4476523413ba40/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1', 'https://beta.sam.gov//opp/7c61c39c6cd24d52bbb34feea9fcf69e/view?keywords=&sort=-modifiedDate&index=opp&is_active=true&page=1']

然后你可以迭代这个列表，在一个新的选项卡中逐个打开所有链接，并执行你想要的操作。希望这有帮助
 您可以这样做，切换到新选项卡并返回到父选项卡
parent=driver.current_window_handle
#Use a for loop
   driver.execute_script("window.open('{0}', '_blank');".format(url))
   driver.switch_to.window(len(driver.window_handles)-1)
   #Do whatever you want
   driver.close()
   driver.switch_to.window(parent)

这回答了你的问题吗？它对你有帮助吗？为什么我需要使用for循环？应该迭代什么？我有1000个页面，每个页面有10个div，每个div有一个链接，我应该访问它并完成我的工作，需要返回到下一个div，再次打开它的链接，。。。第一个页面需要切换10次，第二个页面需要切换10次……如果你抓取了所有的href链接，你可以像这样循环它们到它们的页面并返回。其中url是ahref值。