Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy返回的结果比预期的多_Python_Json_Web Scraping_Scrapy_Web Crawler - Fatal编程技术网

Python Scrapy返回的结果比预期的多

Python Scrapy返回的结果比预期的多,python,json,web-scraping,scrapy,web-crawler,Python,Json,Web Scraping,Scrapy,Web Crawler,这是问题的继续: 我有一个从JSON响应中提取值的Scrapy spider。它工作得很好,提取正确的值,但不知何故它进入了一个循环,并返回比预期更多的结果(重复结果) 例如,对于test.txt文件中提供的17个值,它返回289结果,这意味着17倍于预期 蜘蛛网内容如下: import scrapy import json from whois.items import WhoisItem class whoislistSpider(scrapy.Spider): name = "w

这是问题的继续:

我有一个从JSON响应中提取值的Scrapy spider。它工作得很好,提取正确的值,但不知何故它进入了一个循环,并返回比预期更多的结果(重复结果)

例如,对于
test.txt
文件中提供的17个值,它返回
289
结果,这意味着
17倍于预期

蜘蛛网内容如下:

import scrapy
import json
from whois.items import WhoisItem

class whoislistSpider(scrapy.Spider):
    name = "whois_list"
    start_urls = []
    f = open('test.txt', 'r')
    global lines
    lines = f.read().splitlines()
    f.close()
    def __init__(self):
        for line in lines:
            self.start_urls.append('http://www.example.com/api/domain/check/%s/com' % line)

    def parse(self, response):
        for line in lines:
            jsonresponse = json.loads(response.body_as_unicode())
            item = WhoisItem()
            domain_name = list(jsonresponse['domains'].keys())[0]
            item["avail"] = jsonresponse["domains"][domain_name]["avail"]
            item["domain"] = domain_name
            yield item
items.py以下内容

import scrapy

class WhoisItem(scrapy.Item):
    avail = scrapy.Field()
    domain = scrapy.Field()
class WhoisPipeline(object):
    def process_item(self, item, spider):
        return item
下面的pipelines.py

import scrapy

class WhoisItem(scrapy.Item):
    avail = scrapy.Field()
    domain = scrapy.Field()
class WhoisPipeline(object):
    def process_item(self, item, spider):
        return item

提前感谢您的回复

解析
函数应该如下所示:

def parse(self, response):
    jsonresponse = json.loads(response.body_as_unicode())
    item = WhoisItem()
    domain_name = list(jsonresponse['domains'].keys())[0]
    item["avail"] = jsonresponse["domains"][domain_name]["avail"]
    item["domain"] = domain_name
    yield item
注意,我删除了循环的


发生了什么:对于每个响应,您将循环并解析它17次。(因此产生17*17条记录)

是否有pipelines.py文件?是的,它是。这是pipelines.py文件中的代码
类WhoisPipeline(对象):def process_item(self、item、spider):return item
工作起来很有魅力!你太棒了@DeanFenster非常感谢你!