Python 用scrapy dosen'；不返回页面内容_Python_Scrapy

Python 用scrapy dosen'；不返回页面内容

python scrapy

Python 用scrapy dosen'；不返回页面内容,python,scrapy,Python,Scrapy,我正试图用我注意到的工具刮取网页，当我解析网页时，它将无法工作。抛出ipython外壳，它返回以下内容： 'دانلود کتاب و کتاب صوتی با طاقچه\n // more info: http://angulartics.github.io/\n (function (i, s, o, g, r, a, m) {\n i[\'GoogleAnalyticsObject\'] = r; i[r] = i[r] || funct

我正试图用我注意到的工具刮取网页，当我解析网页时，它将无法工作。抛出ipython外壳，它返回以下内容：

'دانلود کتاب و کتاب صوتی با طاقچه\n        // more info: http://angulartics.github.io/\n        (function (i, s, o, g, r, a, m) {\n            i[\'GoogleAnalyticsObject\'] = r; i[r] = i[r] || function () {\n                (i[r].q = i[r].q || []).push(arguments)\n            }, i[r].l = 1 * new Date(); a = s.createElement(o),\n                m = s.getElementsByTagName(o)[0]; a.async = 1; a.src = g; m.parentNode.insertBefore(a, m)\n        })(window, document, \'script\', \'//www.google-analytics.com/analytics.js\', \'ga\');\n        ga(\'create\', \'UA-57199074-1\', { \'cookieDomain\': location.hostname == \'localhost\' ? \'none\' : \'auto\' });\n        ga(\'require\', \'ec\');\n    Taaghche works best with JavaScript enabled{ "@context": "http://schema.org", "@type": "WebSite", "url": "https://taaghche.ir/", "name": "طاقچه", "alternateName": "نزدیکترین کتاب فروشی شهر", "potentialAction": { "@type": "SearchAction", "target": "https://taaghche.ir/search?term={search_term_string}", "query-input": "required name=search_term_string" } }{ "@context": "http://schema.org", "@type": "Organization", "url": "https://taaghche.ir", "logo": "https://taaghche.ir/assets/images/taaghchebrand.png", "contactPoint": [{ "@type": "ContactPoint", "telephone": "+۹۸-۲۱-۸۸۱۴۹۸۱۶", "contacttype": "customer support", "areaServed": "IR" }] }'

更像是json响应。我怎样才能刮到它？顺便说一下，我的刮板看起来像这样：

class Taaghche(scrapy.Spider): name='taaghche' def start_requests(self): urls = [] link = 'https://taaghche.ir/search?term=' data = pd.read_csv('books.csv') titles = data.title for title in titles: key = title.replace(" ", "%20") urls.append(link+key) for url in urls: yield scrapy.Request(url=url, callback=self.parse_front) def parse_front(self,response): booklinks = response.xpath('//a[@class="book-link"][1]/@href').extract_first() #print(booklinks) #for booklink in booklinks: yield response.follow(url =booklinks, callback=self.parse_page) def parse_page(self,response): ...

网站内容不由服务器端呈现。网站内容由JavaScript呈现：
在这种情况下，您需要使用其中之一

硒（与刮屑结合）

检查“网络”选项卡中的请求url。可能存在API url，您可以从url获取数据

可能还有其他可能的解决办法
事实证明，它使用了一个简单的API，比通过API:D解析更容易