Python 飞溅+；Scrapy，脚本嵌入，Scrapy extract（）不工作_Python_Scrapy_Scrapy Splash

Python 飞溅+；Scrapy，脚本嵌入，Scrapy extract（）不工作

python scrapy

Python 飞溅+；Scrapy，脚本嵌入，Scrapy extract（）不工作,python,scrapy,scrapy-splash,Python,Scrapy,Scrapy Splash,我的问题是我不能在我的Scrapy爬虫中嵌入Splash脚本，Splash正在工作，我设法在浏览器中呈现我想要的内容，所以我复制了脚本并尝试使用Scrapy解析html这是我的蜘蛛： import scrapy from scrapy_splash import SplashRequest class Ntest(scrapy.Spider): name = "test" script = """ function main(splash)

我的问题是我不能在我的Scrapy爬虫中嵌入Splash脚本，Splash正在工作，我设法在浏览器中呈现我想要的内容，所以我复制了脚本并尝试使用Scrapy解析html这是我的蜘蛛：

import scrapy

from scrapy_splash import SplashRequest

class Ntest(scrapy.Spider):
    name = "test"

    script = """
        function main(splash)
          splash.private_mode_enabled = false
          splash.html5_media_enabled = true
          assert(splash:go(args.url))
          assert(splash:wait(0.3))
          return {
                html = splash:html(),
                png = splash:png(),
                har = splash:har(),
                }
    end
        """



def start_request(self, response):
    yield SplashRequest(
        url = 'https://www.mp4upload.com/embed-yfani9opk91x.html',
        endpoint='render.html',
        args={'lua_source': self.script},
        callback=self.parse,
    )


def parse(self, response):
    r = response.css('body').extract()

这是我的settings.py：

SPLASH_URL = 'http://localhost:8050/'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':       810,
}

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

当我运行scrapy runspider时。\main.py

我明白了：

2018-06-25 14:17:38 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: 

scrapybot)
2018-06-25 14:17:38 [scrapy.utils.log] INFO: Versions: lxml 4.2.2.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 18.4.0, Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 16:07:46) [MSC v.1900 32 bit (Intel)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0h  27 Mar 2018), cryptography 2.2.2, Platform Windows-10-10.0.17134-SP0
2018-06-25 14:17:39 [scrapy.crawler] INFO: Overridden settings: {'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'SPIDER_LOADER_WARN_ONLY': True}
2018-06-25 14:17:39 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2018-06-25 14:17:39 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy_splash.SplashCookiesMiddleware',
 'scrapy_splash.SplashMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-06-25 14:17:39 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy_splash.SplashDeduplicateArgsMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-06-25 14:17:39 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-06-25 14:17:39 [scrapy.core.engine] INFO: Spider opened
2018-06-25 14:17:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-06-25 14:17:39 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-06-25 14:17:39 [scrapy.core.engine] INFO: Closing spider (finished)
2018-06-25 14:17:39 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 6, 25, 12, 17, 39, 112025),
 'log_count/DEBUG': 1,
 'log_count/INFO': 7,
 'start_time': datetime.datetime(2018, 6, 25, 12, 17, 39, 104037)}
2018-06-25 14:17:39 [scrapy.core.engine] INFO: Spider closed (finished)

我应该从html中提取正文，请帮助。

从日志中，很明显没有执行任何请求

如果代码在文章中缩进，

start\u request（）

和

parse（）

在spider类之外定义。即使不是，正确的方法名称也是

start\u requests（）