For loop 用于带有刮痕的环_For Loop_Web Scraping_Scrapy

For loop 用于带有刮痕的环

for-loop web-scraping scrapy

For loop 用于带有刮痕的环,for-loop,web-scraping,scrapy,For Loop,Web Scraping,Scrapy,大家好，我一直在努力学习scrapy，现在正在做我的第一个项目。我写这段代码是为了从中获取NFL球员的消息。我试图建立一个循环，从站点获取每个容器，但当我运行代码时，它不会删除任何内容。代码运行得很好，甚至当我要求它时，它也会输出一个csv文件。它只是没有刮我想我告诉它刮的东西。任何帮助都会很好！谢谢 import scrapy from Roto_Player_News.items import NFLNews class Roto_News_Spider2(scrapy.Spider):

大家好，我一直在努力学习scrapy，现在正在做我的第一个项目。我写这段代码是为了从中获取NFL球员的消息。我试图建立一个循环，从站点获取每个容器，但当我运行代码时，它不会删除任何内容。代码运行得很好，甚至当我要求它时，它也会输出一个csv文件。它只是没有刮我想我告诉它刮的东西。任何帮助都会很好！谢谢

import scrapy
from Roto_Player_News.items import NFLNews

class Roto_News_Spider2(scrapy.Spider):
    name="PlayerNews2"
    allowed_domains = ["rotoworld.com"]
    start_urls = ('http://www.rotoworld.com/playernews/nfl/football/',)

    def parse(self,response):

        containers= response.xpath('//*[@id="cp1_pnlNews"]/div/div[2]')

        def parse(self, response):

            for container in containers:
                def parse(self, response):           
                    item=NFLNews()
                    item['player']= response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="report"]/text()')
                    item['headline'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="report"]/p/text()').extract()
                    item['info'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="impact"]/text()').extract()
                    item['date'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="info"]/div[@class="date"]/text()').extract()
                    item['source'] = response.xpath('//div[@class="pb"][1]/div[@id="cp1_ctl00_rptBlurbs_floatingcontainer_0"]/div[@class="info"]/div[@class="source"]/a/text()').extract()

                    yield item

您定义的XPath看起来不太好。试试这个。它会给你带来你想要的内容。只需复制粘贴即可

import scrapy

class Roto_News_Spider2(scrapy.Spider):
    name = "PlayerNews2"

    start_urls = [
        'http://www.rotoworld.com/playernews/nfl/football/',
    ]

    def parse(self, response):
        for item in response.xpath("//div[@class='pb']"):
            player = item.xpath(".//div[@class='player']/a/text()").extract_first()
            report = item.xpath(".//div[@class='report']/p/text()").extract_first()
            date = item.xpath(".//div[@class='date']/text()").extract_first()
            impact = item.xpath(".//div[@class='impact']/text()").extract_first().strip()
            source = item.xpath(".//div[@class='source']/a/text()").extract_first()
            yield {"Player": player,"Report":report,"Date":date,"Impact":impact,"Source":source}

非常感谢。如果我也想获得团队和职位，我应该怎么做？玩家的名字很好地嵌入了标签中，但是如何处理div.class=“player”？请尝试以下两种：

idk1=item.xpath（“.//div[@class='player']/text（）”）.extract（）[0].strip（）

和

idk2=item.xpath（.//div[@class='player']/a/text（）”）.extract（）[1].strip（）

。我想这就是你的意思。酷。在理论层面上，[0]和[1]的数组要求什么？第二条路可行，给了我团队。这个职位仍然没有。有没有办法要求在“-”之间输入文本？我在使用BeautifulSoup时能够做到这一点，但我不知道如何使用xpath和scrapy做到这一点。谢谢你的帮助！如果这是你所说的位置，那么它们在我这边正确地通过了。对不起，你是对的。我打错了。你也得到了“-”吗？