python中来自scrapy的返回响应
我正在使用scrapy在python中搜索某些网站。下载响应时,我希望将此响应传递给主程序python中来自scrapy的返回响应,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我正在使用scrapy在python中搜索某些网站。下载响应时,我希望将此响应传递给主程序 class QuotesSpider(scrapy.Spider): name = 'piracy' def __init__(self, *args, **kwargs): super(QuotesSpider, self).__init__(*args, **kwargs) self.start_urls = kwargs.get('url_to
class QuotesSpider(scrapy.Spider):
name = 'piracy'
def __init__(self, *args, **kwargs):
super(QuotesSpider, self).__init__(*args, **kwargs)
self.start_urls = kwargs.get('url_to_scrap')
self.returndict = kwargs.get('returndict')
def parse(self, response):
print "Getting Text From {}".format(response.url)
#return the_html_page
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
crawler = CrawlerRunner(get_project_settings())
crawler.crawl(QuotesSpider, url_to_scrap=['https://google.com'])
te = Thread(target=reactor.run, args=(False,)).start()
# here I should get page response
# print the_html_page
我需要在reactor在线程中运行后获得返回的响应,以便进行后期处理。在类之前设置一个全局变量,并在解析器中为其分配响应。或者,如果内存是一个潜在问题,则配置spider写入一个文件,然后再从该文件读取。你能正确阐述你的问题吗?!