Python for循环中的Scrapy请求
因此,我尝试在中应用示例,但我的输出很奇怪: 我的项目:Python for循环中的Scrapy请求,python,scrapy,request,Python,Scrapy,Request,因此,我尝试在中应用示例,但我的输出很奇怪: 我的项目: class Artiste(scrapy.Item): url = scrapy.Field() nom = scrapy.Field() styles = scrapy.Field() 我的刮痧课: class AnnuSpider(scrapy.Spider): name = "annu" start_urls = [ 'https://www.livetonight.fr/g
class Artiste(scrapy.Item):
url = scrapy.Field()
nom = scrapy.Field()
styles = scrapy.Field()
我的刮痧课:
class AnnuSpider(scrapy.Spider):
name = "annu"
start_urls = [
'https://www.livetonight.fr/groupe-musique-dj',
]
def parse(self, response):
doc = Artiste()
for artiste in response.css('.card-musician'):
details_partial_link = artiste.css('a::attr(href)').get()
doc['nom'] = artiste.css('.card-musician-title-wrapper').xpath('normalize-space(./h4/text())').get()
doc['url'] = details_partial_link
details_link = response.urljoin(details_partial_link)
request = scrapy.Request(details_link, callback=self.parse_details)
request.meta['item'] = doc
print "NOM", doc['nom']
yield request
def parse_details(self, response):
doc = response.meta['item']
doc['styles'] = response.css('.show-overview-info').xpath('normalize-space(./p/text())')[0].get()
return doc
因此,我没有给我21行代码,每个代码都有自己的nom
,url
,style
,而是得到了21行相同的代码(这是列表的最后一行)nom
和url
以及右侧的style
以下是完整的输出:
[
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Jazz / Folk / Rock"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Rock / Pop / Folk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Soul / Pop / Funk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Soul / Pop / Funk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Jazz / Pop"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Pop / Rock / Funk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Pop / Rock / Jazz"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Rock / Pop / Funk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Rock / Pop / Funk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Jazz"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Rock / Blues / Soul"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Rock / Blues / Soul"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Funk / Soul / Pop"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Pop / Folk / Soul"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Pop / Jazz / Funk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Pop / Jazz / Funk"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Jazz / Swing / Musique du monde"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Guinguette / Swing"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Guinguette / Swing"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Jazz / Swing / Pop"},
{"url": "/groupe-musique-dj/4123-remi-dugue-trio", "nom": "R\u00e9mi Dugu\u00e9 Trio", "styles": "Pop / Funk / Dj"}
]
对我来说奇怪的是,如果我去掉请求,我的输出是完美的。与此代码类似:
class AnnuSpider(scrapy.Spider):
name = "annu"
start_urls = [
'https://www.livetonight.fr/groupe-musique-dj',
]
def parse(self, response):
doc = Artiste()
for artiste in response.css('.card-musician'):
details_partial_link = artiste.css('a::attr(href)').get()
doc['nom'] = artiste.css('.card-musician-title-wrapper').xpath('normalize-space(./h4/text())').get()
doc['url'] = details_partial_link
details_link = response.urljoin(details_partial_link)
yield doc
尝试将文档声明更改为循环内部:
def解析(self,response):
对于响应.css(“.card munizer”)的艺人:
doc=艺人()
...
哈哈,玩得很好。在那件事上被困了好几个小时。。我想我该去睡觉了!