Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scraper是用python实现的,使用scrapy和selenium启动但关闭_Python_Selenium_Xpath_Web Scraping_Scrapy - Fatal编程技术网

Scraper是用python实现的,使用scrapy和selenium启动但关闭

Scraper是用python实现的,使用scrapy和selenium启动但关闭,python,selenium,xpath,web-scraping,scrapy,Python,Selenium,Xpath,Web Scraping,Scrapy,我很难实现我的scraper(我从这里获取了最初的示例代码[from@alecxe,并完成了以获得一些结果),但是如果scraper似乎启动(我们可以观察到单击next按钮的模拟),它会在一秒钟后关闭,并且不会打印或获取项目中的任何内容 这是密码 from scrapy.spider import BaseSpider from selenium import webdriver class product_spiderItem(scrapy.Item): title = scrap

我很难实现我的scraper(我从这里获取了最初的示例代码[from@alecxe,并完成了以获得一些结果),但是如果scraper似乎启动(我们可以观察到单击next按钮的模拟),它会在一秒钟后关闭,并且不会打印或获取项目中的任何内容

这是密码

from scrapy.spider import BaseSpider 
from selenium import webdriver

class product_spiderItem(scrapy.Item):
    title = scrapy.Field()
    price=scrapy.Field()
    pass

class ProductSpider(BaseSpider):
    name = "product_spider"
    allowed_domains = ['ebay.com']
    start_urls = ['http://www.ebay.com/sch/i.html?_odkw=books&_osacat=0&_trksid=p2045573.m570.l1313.TR0.TRC0.Xpython&_nkw=python&_sacat=0&_from=R40']

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)

        while True:
            next = self.driver.find_element_by_xpath('//td[@class="pagn-next"]/a')

            try:
                next.click()

            # get the data and write it to scrapy items
                response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
                print response.url
                for prod in response.xpath('//ul[@id="GalleryViewInner"]/li/div/div'):
                    item = product_spiderItem()
                    item['title'] = prod.xpath('.//div[@class="gvtitle"]/h3/a/text()').extract()[0]
                    item['price'] = prid.xpath('.//div[@class="prices"]/span[@class="bold"]/text()').extract()[0]
                    print item['price']
                    yield item

            except:
                break

        self.driver.close()

我使用scrapy crawl product_scraper-o products.json来存储结果。我缺少什么?

在试图理解代码的错误时,我进行了一些编辑,并提出了以下(经过测试的)代码,这些代码应该更接近您的目标:

import scrapy
from selenium import webdriver

class product_spiderItem(scrapy.Item):
    title = scrapy.Field()
    price=scrapy.Field()
    pass

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['ebay.com']
    start_urls = ['http://www.ebay.com/sch/i.html?_odkw=books&_osacat=0&_trksid=p2045573.m570.l1313.TR0.TRC0.Xpython&_nkw=python&_sacat=0&_from=R40']

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)

        while True:

            sel = scrapy.Selector(text=self.driver.page_source)

            for prod in sel.xpath('//ul[@id="GalleryViewInner"]/li/div/div'):
                item = product_spiderItem()
                item['title'] = prod.xpath('.//div[@class="gvtitle"]/h3/a/text()').extract()
                item['price'] = prod.xpath('.//div[@class="prices"]//span[@class=" bold"]/text()').extract()
                yield item

            next = self.driver.find_element_by_xpath('//td[@class="pagn-next"]/a')

            try:
                next.click()

            except:
                break

    def closed(self, reason):
        self.driver.close()

请尝试此代码是否工作得更好。

它工作得很好!非常感谢!!我想主要原因是选择页面内容时使用了sel=scrapy.Selector(text=self.driver.page_source)而不是Textresponse。但是Textresponse似乎在我看到的其他代码中也能工作。欢迎:-)请注意,我删除了extract()后面的[0],以防止在找不到元素时出错。