Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用旋转代理运行刮屑飞溅_Python_Proxy_Scrapy_Scrapy Splash - Fatal编程技术网

Python 使用旋转代理运行刮屑飞溅

Python 使用旋转代理运行刮屑飞溅,python,proxy,scrapy,scrapy-splash,Python,Proxy,Scrapy,Scrapy Splash,我正在尝试使用scrapy与splash和旋转代理。以下是我的设置.py: ROBOTSTXT_OBEY = False BOT_NAME = 'mybot' SPIDER_MODULES = ['myproject.spiders'] NEWSPIDER_MODULE = 'myproject.spiders' LOG_LEVEL = 'INFO' USER_AGENT = 'Mozilla/5.0' # JSON file pretty formatting FEED_EXPORT_IND

我正在尝试使用scrapy与splash和旋转代理。以下是我的设置.py:

ROBOTSTXT_OBEY = False
BOT_NAME = 'mybot'
SPIDER_MODULES = ['myproject.spiders']
NEWSPIDER_MODULE = 'myproject.spiders'
LOG_LEVEL = 'INFO'
USER_AGENT = 'Mozilla/5.0'

# JSON file pretty formatting
FEED_EXPORT_INDENT = 4

# Suppress dataloss warning messages of scrapy downloader
DOWNLOAD_FAIL_ON_DATALOSS = False   
DOWNLOAD_DELAY = 1.25  

# Enable or disable spider middlewares
SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

# Enable or disable downloader middlewares
DOWNLOADER_MIDDLEWARES = {
    'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
    'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
}

# Splash settings
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
SPLASH_URL = 'http://localhost:8050'
我正在设置spider中的旋转\u代理\u列表:

proxy_list = re.findall(r'(\d*\.\d*\.\d*\.\d*\:\d*)\b',
             requests.get("https://raw.githubusercontent.com/clarketm/proxy-list/master/proxy-list.txt").text)     
custom_settings = {'ROTATING_PROXY_LIST': proxy_list}
我开始溅起水花
docker run-p8050:8050 scrapinghub/splash
。以下是启动启动请求的方式:

def start_requests(self):
    urls =  [ 'http://example-com/page_1.html', 'http://example-com/page_1.html']
    for url in urls:
        yield SplashRequest(url, 
                            self.parse_url, 
                            headers={'User-Agent': self.user_agent }, 
                            args = {'render_all': 1, 'wait': 0.5}
                            )
但是,当运行爬虫程序时,我没有看到任何请求通过Splash。我怎样才能解决这个问题

谢谢 吉娜