Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/303.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Selenium无法切换选项卡和提取url_Python_Selenium_Web Scraping_Web Crawler_Scrapy - Fatal编程技术网

Python Selenium无法切换选项卡和提取url

Python Selenium无法切换选项卡和提取url,python,selenium,web-scraping,web-crawler,scrapy,Python,Selenium,Web Scraping,Web Crawler,Scrapy,在这张剪贴簿中,我想单击转到存储的在新选项卡中打开url捕获url并关闭并转到原始选项卡。但是脚本抛出了错误 import scrapy from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.selector import Selector from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from selenium import webdr

在这张剪贴簿中,我想单击转到存储的在新选项卡中打开url捕获url并关闭并转到原始选项卡。但是脚本抛出了错误

import scrapy
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from selenium import webdriver
from urlparse import urljoin
import time
from selenium.webdriver.common.keys import Keys

class CompItem(scrapy.Item):
    model_name = scrapy.Field()
    model_link = scrapy.Field()
    url  =scrapy.Field()

class criticspider(CrawlSpider):
    name = "extract"
    allowed_domains = ["mysmartprice.com"]
    start_urls = ["http://www.mysmartprice.com/computer/lenovo-g50-70-laptop-msf201821"]


    def __init__(self, *args, **kwargs):
        super(criticspider, self).__init__(*args, **kwargs)
        self.download_delay = 0.25
        self.browser = webdriver.Firefox()

        self.browser.implicitly_wait(20)

    def parse_start_url(self, response):
        self.browser.get(response.url)
        item = CompItem()
        time.sleep(10)
        items = []





            # Save the window opener (current window, do not mistaken with tab... not the same)
        button = self.browser.find_element_by_xpath("/html/body/div[3]/div/div[3]/div/div[2]/div[4]/div[4]/div[5]/div[1]")
        main_window = self.browser.current_window_handle

        # Open the link in a new tab by sending key strokes on the element
        # Use: Keys.CONTROL + Keys.SHIFT + Keys.RETURN to open tab on top of the stack 
        button.send_keys(Keys.CONTROL + Keys.RETURN)

        # Switch tab to the new tab, which we will assume is the next one on the right
        self.browser.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)
        time.sleep(10)
        # Put focus on current window which will, in fact, put focus on the current visible tab
        self.browser.switch_to_window(main_window)
        item['url'] = self.browser.current_url

        # do whatever you have to do on this page, we will just got to sleep for now
        time.sleep(2)

        # Close current tab
        self.browser.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')

        yield item

代码没有抛出任何错误,我已经尝试在多个浏览器中使用。但是找不到有什么问题?

假设您想访问每个特色商店并返回主窗口,一个选项是执行SHIFT+单击以在新窗口中打开“转到商店”链接,关闭新打开的窗口并返回主窗口的上下文:

import scrapy
from scrapy.contrib.spiders import CrawlSpider
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys


class CompItem(scrapy.Item):
    model_name = scrapy.Field()
    model_link = scrapy.Field()
    url = scrapy.Field()


class criticspider(CrawlSpider):
    name = "extract"
    allowed_domains = ["mysmartprice.com"]
    start_urls = ["http://www.mysmartprice.com/computer/lenovo-g50-70-laptop-msf201821"]

    def __init__(self, *args, **kwargs):
        super(criticspider, self).__init__(*args, **kwargs)
        self.download_delay = 0.25
        self.browser = webdriver.Firefox()
        self.browser.maximize_window()

        self.browser.implicitly_wait(20)

    def parse_start_url(self, response):
        self.browser.get(response.url)

        # waiting for "Go to store" to become visible
        wait = WebDriverWait(self.browser, 10)
        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.store_pricetable")))

        main_window = self.browser.window_handles[0]

        # iterate over featured stores and visit them
        for store in self.browser.find_elements_by_css_selector("div.store_pricetable"):

            item = CompItem()

            # shift+click on the "Go to Store" link
            link = store.find_element_by_css_selector("div.store_gostore > div.storebutton")
            ActionChains(self.browser).key_down(Keys.SHIFT).move_to_element(link).click(link).key_up(Keys.SHIFT).perform()

            # there is a popup preventing us to navigate to the store URL - close it 
            try:
                popup_close = self.browser.find_element_by_css_selector(".popup-closebutton")
                popup_close.click()

                # repeat the click
                ActionChains(self.browser).key_down(Keys.SHIFT).move_to_element(link).click(link).key_up(Keys.SHIFT).perform()
            except NoSuchElementException:
                pass

            # switch to the newly opened window, read the current url and close the window
            self.browser.switch_to.window(self.browser.window_handles[-1])

            # wait until "On your way to the store" would not be in title
            wait.until(lambda browser: "On your way to the Store" not in browser.title)

            item['url'] = self.browser.current_url
            self.browser.close()

            # switch back to the main window
            self.browser.switch_to.window(main_window)

            yield item
这对我有效,并输出2项:

{'url': u'http://www.ebay.in/itm/LENOVO-G50-70-LAPTOP-59422417-/231660194652?aff_source=mysmartprice'}
{"url": "https://paytm.com/shop/p/lenovo-g50-70-core-i7-4500-4th-gen-8-gb-1-tb-15-6-inch-2-gb-graphics-win8-1-no-bag-black-CMPLXLAPLENOVO-G50-7DUMM20256A81CC05?utm_source=Affiliates&utm_medium=msp&utm_campaign=msp"}

如何获取所有商店页面的所有url?我只希望商店的url是所有url,而不是我在start_url中解析的url,如何忽略它?@JohnDene我添加了一个说明,您可能希望增加页面加载超时,以便在读取
当前\u url
之前加载它。我可以添加时间吗。sleep()在url捕获之前?@JohnDene您可以,但这不太可靠,不推荐使用。最好的选择是使用显式等待。等等,我会提供的。