Python 如何使用scrapy抓取Google Play网站背景：_Python_Ajax_Python 2.7_Scrapy

Python 如何使用scrapy抓取Google Play网站背景：

python ajax python-2.7 scrapy

Python 如何使用scrapy抓取Google Play网站背景：,python,ajax,python-2.7,scrapy,Python,Ajax,Python 2.7,Scrapy,我试图在Google Play网站上抓取一个页面当我使用浏览器浏览该页面并使用浏览器滚动向下滚动时，我得到了新的应用程序/项目。这绝对是一个AJAX调用问题: 我不知道如何使用Scrapy获得我在使用浏览器滚动时获得的应用程序我所尝试的：我抓取了该页面并打印了以下响应：正如您所看到的，有一个加载信号，它在使用浏览器时不会出现，因为它会自动调用AJAX调用注: 我知道我们可以使用Scrapy来调用HXR AJAX调用，但我希望我的爬行器能够对该页面进行爬网，直到没有应用程序为止，这样

我试图在Google Play网站上抓取一个页面

当我使用浏览器浏览该页面并使用浏览器滚动向下滚动时，我得到了新的应用程序/项目。这绝对是一个AJAX调用

问题: 我不知道如何使用Scrapy获得我在使用浏览器滚动时获得的应用程序

我所尝试的：我抓取了该页面并打印了以下响应：

正如您所看到的，有一个加载信号，它在使用浏览器时不会出现，因为它会自动调用AJAX调用

注: 我知道我们可以使用Scrapy来调用HXR AJAX调用，但我希望我的爬行器能够对该页面进行爬网，直到没有应用程序为止，这样爬行器（如果有）就会自动知道AJAX调用

我正在Windows 7 64位上使用Python2.7.9和Scrapy0.26

注2：我已经查过了

非常感谢

这是一种基本方法（不是很pythonic），可以向您展示使用Selenium Webdriver解决问题的可能方法

基本思想是：

创建无头浏览器（
```
webdriver.Firefox（）
```
）
使istance加载一个页面（
```
self.driver.get（response.url）
```
）
在页面中搜索元素（本例为版权标签
```
©2015 Google
```
），我们已经知道它位于底部
当元素不可见时，保持将页面内的焦点移动到该元素

这样，页面将继续加载元素

import scrapy
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from scrapy.contrib.spiders import CrawlSpider    

class googleplay(CrawlSpider):
    name = "googleplay"
    allowed_domains = ["play.google.com"]
    start_urls = ["https://play.google.com"]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)      
        copyright = self.driver.find_element_by_class_name('copyright')
        ActionChains(self.driver).move_to_element(copyright).perform()

        while not copyright.is_displayed():
            copyright = self.driver.find_element_by_class_name('copyright')
            time.sleep(3) #to let page content loading
            ActionChains(self.driver).move_to_element(copyright).perform()

        #scrape by here

在循环结束时，您可以确定所有页面都已加载，并且您可以修改用于删除内容的代码

此处的文档：

您是否考虑过使用scrapy+webdriver？您可以使用webdriver来模拟鼠标移动，从而强制页面reload@aberna实际上我从来没有听说过

webdriver

你能给我一个链接吗-