Python 使用scrapy将变量传递到spider文件夹中的test.py
我用的是刮痧。以下是spider文件夹中Python 使用scrapy将变量传递到spider文件夹中的test.py,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我用的是刮痧。以下是spider文件夹中test.py的代码 from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from craigslist_sample.items import CraigslistSampleItem class MySpider(BaseSpider): name = "craig" allowed_domains = ["craigsl
test.py
的代码
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist_sample.items import CraigslistSampleItem
class MySpider(BaseSpider):
name = "craig"
allowed_domains = ["craigslist.org"]
start_urls = ["http://seattle.craigslist.org/npo/"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//span[@class='pl']")
items = []
for titles in titles:
item = CraigslistSampleItem()
item["title"] = titles.select("a/text()").extract()
item["link"] = titles.select("a/@href").extract()
items.append(item)
return items
基本上,我想迭代我的url列表,并将url传递到start\ulrs
的MySpider
类中。您能给我一些建议吗?您需要重写方法,而不是“静态定义”start\u URL
:
from scrapy.http import Request
class MySpider(BaseSpider):
name = "craig"
allowed_domains = ["craigslist.org"]
def start_requests(self)
list_of_urls = [...] # reading urls from a text file, for example
for url in list_of_urls:
yield Request(url)
def parse(self, response):
...