Web scraping 从Scrapy中的csv文件导入开始URL

Web scraping 从Scrapy中的csv文件导入开始URL,web-scraping,scrapy,Web Scraping,Scrapy,我最近开始使用scrapy进行web抓取,我生成了一个URL列表,我想从中抓取到一个以新行分隔的txt文档中。这是我的爬虫程序代码: import scrapy import csv import sys from realtor.items import RealtorItem from scrapy.spider import BaseSpider #from scrapy.selector import HtmlXPathSelector #from realtor.items impo

我最近开始使用scrapy进行web抓取,我生成了一个URL列表,我想从中抓取到一个以新行分隔的txt文档中。这是我的爬虫程序代码:

import scrapy
import csv
import sys
from realtor.items import RealtorItem

from scrapy.spider import BaseSpider
#from scrapy.selector import HtmlXPathSelector
#from realtor.items import RealtorItem
class RealtorSpider(scrapy.Spider):
    name = "realtor"
    allowed_domains = ["realtor.com"]

    with open('realtor2.txt') as f:
        start_urls = [url.strip() for url in f.readlines()]


    def parse(self, response):
        #hxs = HtmlXPathSelector(response)
        #sites = hxs.select('//div/li/div/a/@href')
        sites = response.xpath('//a[contains(@href, "/realestateandhomes-detail/")]')
        items =  []
        for site in sites: 
            print(site.extract())
            item = RealtorItem()
            item['link'] = site.xpath('@href').extract()
            items.append(item)
        return items
现在我的目标是从realtor2.txt读取链接并开始解析它们,但是我在请求URL中得到一个valueError missing方案:

  File "C:\Users\Ash\Anaconda2\lib\site-packages\scrapy\http\request\__init__.py", line 58, in _set_url
    raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: 
%FF%FEw%00w%00w%00.%00r%00e%00a%00l%00t%00o%00r%00.%00c%00o%00m%00/%00r%00e%00a%00l%00e%00s%00t%00a%00t%00e%00a%00n%00d%00h%00o%00m%00e%00s%00-%00d%00e%00t%00a%00i%00l%00/%005%000%00-%00M%00e%00n%00o%00r%00e%00s%00-%00A%00v%00e%00-%00A%00p%00t%00-%006%001%000%00_%00C%00o%00r%00a%00l%00-%00G%00a%00b%00l%00e%00s%00_%00F%00L%00_%003%003%001%003%004%00_%00M%005%003%008%000%006%00-%005%008%006%007%007%00%0D%00
2017-06-25 22:28:35 [scrapy.core.engine] INFO: Closing spider (finished)
我认为在定义开始url时可能会出现问题,但我不知道如何继续,

“ValueError:请求url中缺少方案”
表示您缺少http

您可以使用来避免此问题。

能否在csv中发布前几个项目?