Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 刮擦的网页爬行变差了_Python_Web Scraping_Scrapy_Scrapy Spider - Fatal编程技术网

Python 刮擦的网页爬行变差了

Python 刮擦的网页爬行变差了,python,web-scraping,scrapy,scrapy-spider,Python,Web Scraping,Scrapy,Scrapy Spider,我对scrapy是个新手,试图通过浏览yellowpages.com网站来了解它 我的目标是编写一个python代码,输入yellowpages.com主页的搜索字段(业务和位置),然后刮取后续URL 我的代码如下所示: import scrapy from scrapy.spiders import Spider from scrapy.selector import Selector from spider.items import Website class YellowPages(Sp

我对scrapy是个新手,试图通过浏览yellowpages.com网站来了解它

我的目标是编写一个python代码,输入yellowpages.com主页的搜索字段(业务和位置),然后刮取后续URL

我的代码如下所示:

import scrapy
from scrapy.spiders import Spider
from scrapy.selector import Selector
from spider.items import Website

class YellowPages(Spider):
    name = "yellow"
    allowed_domains = ["yellowpages.com"]
    start_urls = [
        "http://www.yellowpages.com/"
    ]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formxpath="//form[@id='search-form']",
            formdata={
                "query":"business",
                "location" : "78735" },
            callback=self.after_results
        )

    def after_results(self, response):
        self.logger.info("info msg")
我想在“78735”位置搜索“业务”。但是,这些不是传递给网站的值。我的日志如下所示:

import scrapy
from scrapy.spiders import Spider
from scrapy.selector import Selector
from spider.items import Website

class YellowPages(Spider):
    name = "yellow"
    allowed_domains = ["yellowpages.com"]
    start_urls = [
        "http://www.yellowpages.com/"
    ]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formxpath="//form[@id='search-form']",
            formdata={
                "query":"business",
                "location" : "78735" },
            callback=self.after_results
        )

    def after_results(self, response):
        self.logger.info("info msg")
2016-01-28 23:55:36[scrapy]调试:爬网(200)(参考:无)

2016-01-28 23:55:36[scrapy]调试:爬网(200)(参考:http://www.yellowpages.com/)

在第二个url中,以某种方式插入了术语Los+Angeles。当我尝试手动输入搜索字段并提交时,url应该是这样的:

http://www.yellowpages.com/search?search_terms=business&geo_location_terms=78735

有人能告诉我出了什么问题以及如何解决吗

非常感谢

仅供参考,以下是yellowpages.com主页的HTML源代码部分

您想查找什么?
  • 按企业名称或关键字搜索
      在哪里?

设置
搜索词
地理位置词
表单参数:

def parse(self, response):
    return scrapy.FormRequest.from_response(
        response,
        formxpath="//form[@id='search-form']",
        formdata={
            "search_terms": "business",
            "geo_location_terms" : "78735"},
        callback=self.after_results
    )
使用以下卡盘进行测试:

import scrapy
from scrapy.spiders import Spider


class YellowPages(Spider):
    name = "yellow"
    allowed_domains = ["yellowpages.com"]
    start_urls = [
        "http://www.yellowpages.com/"
    ]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formxpath="//form[@id='search-form']",
            formdata={
                "search_terms":"business",
                "geo_location_terms" : "78735"},
            callback=self.after_results
        )

    def after_results(self, response):
        for result in response.css("div.result a[itemprop=name]::text").extract():
            print(result)
打印“德克萨斯州奥斯汀”的企业列表:


设置
search\u terms
geo\u location\u terms
表单参数:

def parse(self, response):
    return scrapy.FormRequest.from_response(
        response,
        formxpath="//form[@id='search-form']",
        formdata={
            "search_terms": "business",
            "geo_location_terms" : "78735"},
        callback=self.after_results
    )
使用以下卡盘进行测试:

import scrapy
from scrapy.spiders import Spider


class YellowPages(Spider):
    name = "yellow"
    allowed_domains = ["yellowpages.com"]
    start_urls = [
        "http://www.yellowpages.com/"
    ]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formxpath="//form[@id='search-form']",
            formdata={
                "search_terms":"business",
                "geo_location_terms" : "78735"},
            callback=self.after_results
        )

    def after_results(self, response):
        for result in response.css("div.result a[itemprop=name]::text").extract():
            print(result)
打印“德克萨斯州奥斯汀”的企业列表: