Python scrapy selenium驱动程序不';我跟不上
设置文件:Python scrapy selenium驱动程序不';我跟不上,python,selenium,scrapy,Python,Selenium,Scrapy,设置文件: from scrapy_selenium import SeleniumRequest import scrapy from selenium import webdriver class testspider1(scrapy.Spider): driver=webdriver.Firefox(executable_path=r"C:\Users\test\Desktop\geckodriver") name = 'test5' start
from scrapy_selenium import SeleniumRequest
import scrapy
from selenium import webdriver
class testspider1(scrapy.Spider):
driver=webdriver.Firefox(executable_path=r"C:\Users\test\Desktop\geckodriver")
name = 'test5'
start_urls=['http://httpbin.org/ip']
def parse(self, response):
print(response.body)
url = "https://www.target.com/p/cesar-canine-cuisine-filet-mignon-flavor-wet-dog-food-3-5oz-tray/-/A-14903668"
yield SeleniumRequest(url=url,callback=self.parse_result)
def parse_result(self,response):
image = response.xpath('//*[@id="mainContainer"]/div/div/div[1]/div[1]/div[2]/div[1]/div/div/div/div/div/div/div/a/div/div/div/div/div/img/@src').extract_first()
price = response.selector.xpath('//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[1]/span/text()').extract_first()
print(image)
print("\n\n")
print(price)
我已经一步一步地按照说明做了,但是司机没有按照任何链接做。我相信这两个请求都是由scrapy处理的。我不想更改\uuuu init\uuuu
,因为我希望一些请求由scrapy selenium处理,而另一些请求由scrapy(单独)处理
我检查了,但它更改了整个init
,使selenium成为self.driver
我希望一些请求由SeleniumRequest
处理,其他请求由scrapyRequest处理
注意:我使用这个站点作为示例站点,使用java显示结果,如果由scrapy(单独)处理的数据尚未呈现,那么结果将是空列表我将firefox替换为chrome:
from shutil import which
BOT_NAME = 'seleniumtest'
SPIDER_MODULES = ['seleniumtest.spiders']
NEWSPIDER_MODULE = 'seleniumtest.spiders'
SELENIUM_DRIVER_NAME = 'firefox'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
SELENIUM_BROWSER_EXECUTABLE_PATH = which(r"C:\Users\test\Desktop\geckodriver")
ROBOTSTXT_OBEY = True
DOWNLOADER_MIDDLEWARES = {
'scrapy_selenium.SeleniumMiddleware': 800
}
这是怎么打破的?你能回溯一下吗?
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())