Python 如何检索数据。。。页面是使用ajax加载的_Python_Web Scraping_Scrapy

Python 如何检索数据。。。页面是使用ajax加载的

python web-scraping scrapy

Python 如何检索数据。。。页面是使用ajax加载的,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我想从这个网站上了解手机的价格我试图测试它，所以我使用：伤痕累累的外壳但我无法连接到此网站。当使用ajax加载页面时，我使用firebug找到了开始url。有人能告诉我哪里出错了吗写一个JavaScript脚本来执行单击页码时已经执行的操作，然后简单地转储从服务器返回的XML。我的意思是尝试调用服务器，就好像该站点托管在您的桌面上一样当您点击一个数字时调用的JavaScript函数是paginalist（'numberOfPage'）其中numberOfPage是您想要访问的页面函数

我想从这个网站上了解手机的价格

我试图测试它，所以我使用：伤痕累累的外壳

但我无法连接到此网站。当使用ajax加载页面时，我使用firebug找到了开始url。有人能告诉我哪里出错了吗

写一个JavaScript脚本来执行单击页码时已经执行的操作，然后简单地转储从服务器返回的XML。我的意思是尝试调用服务器，就好像该站点托管在您的桌面上一样

当您点击一个数字时调用的JavaScript函数是

paginalist（'numberOfPage'）

其中

numberOfPage

是您想要访问的页面

函数的主体是

function paginateList(viewIndex) {
        var productCategoryId = document.pageSelect.category_id.value;
        var viewSize = document.pageSelect.VIEW_SIZE.value;
        var min = "";
        if(document.pageSelect.min!=null)
            min = document.pageSelect.min.value;
        var max = "";
        if(document.pageSelect.max!=null)
            max = document.pageSelect.max.value;
        var attrName = "";
        if(document.pageSelect.attrName!=null)
        attrName = document.pageSelect.attrName.value;
        if(attrName==""){
         var commaAttr=document.getElementById('commaAttr'); 
          attrName=commaAttr.value;
          }
        var limitView = 'true';
        var sortSearchPrice = "";
        if(document.pageSelect.sortSearchPrice!=null)   
        sortSearchPrice = document.pageSelect.sortSearchPrice.value;
          var url2="/control/AjaxCategoryDetail?productCategoryId="+productCategoryId+"&category_id="+productCategoryId+"&attrName="+attrName+"&min="+min+"&max="+max+"&sortSearchPrice="+sortSearchPrice+"&VIEW_INDEX="+viewIndex+"&VIEW_SIZE="+viewSize+"&serachupload=&sortupload=";
            pleaseWait('Y');
            jQuery.ajax({url: url2,
             data: null,
             type: 'post',
             async: false,
             success: function(data) {
              $('#searchResult').html(data);  
              pleaseWait('N');   
             },
             error: function(data) {
                alert("Error during product searching");
             }
         });

使用这些递归地从每个页面获取数据

希望有帮助

这是你的蜘蛛：

from scrapy.item import Item, Field
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector


class UnivercellItem(Item):
    vendor = Field()
    model = Field()
    price = Field()

BASE_URL = "http://www.univercell.in/control/AjaxCategoryDetail?productCategoryId=PRO-SMART&category_id=PRO-SMART&attrName=&min=&max=&sortSearchPrice=&VIEW_INDEX=%s&VIEW_SIZE=15&serachupload=&sortupload="

class UnivercellSpider(BaseSpider):
    name = "univercell_spider"
    allowed_domains = ["www.univercell.in"]
    start_urls = [BASE_URL % index for index in range(1, 21)]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        mobiles = hxs.select("//div[@class='productsummary']")
        print mobiles
        for mobile in mobiles:
            item = UnivercellItem()
            item['vendor'] = mobile.select('.//div[1]/div/text()').extract()[0].strip()
            item['model'] = mobile.select('.//div[3]/div[1]/a/text()').extract()[0].strip()
            item['price'] = mobile.select('.//span[@class="regularPrice"]/span/text()').extract()[0].strip()
            yield item

将其保存到

spider.py

并通过

scrapy runspider.py-o output.json运行。然后在output.json
中，您将看到：
{"model": "T375", "vendor": "LG", "price": "Special Price Click Here"}
{"model": "P725 Optimus 3D Max", "vendor": "LG", "price": "Special Price Click Here"}
{"model": "P705 Optimus L7", "vendor": "LG", "price": "Special Price Click Here"}
{"model": "9320 Curve", "vendor": "Blackberry", "price": "Special Price Click Here"}
{"model": "Xperia Sola", "vendor": "Sony", "price": "Rs.14,500.00"}
{"model": "Xperia U", "vendor": "Sony", "price": "Special Price Click Here"}
{"model": "Lumia 610", "vendor": "Nokia", "price": "Special Price Click Here"}
...

希望有帮助。
您不需要将其包含在py文件中。您只需将其插入.html文件即可！