Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy start\u URL太长/不支持的URL方案_Python_Scrapy - Fatal编程技术网

Python Scrapy start\u URL太长/不支持的URL方案

Python Scrapy start\u URL太长/不支持的URL方案,python,scrapy,Python,Scrapy,我的Scrapy Spider有一个问题,它报告了一个不受支持的URL方案。 我想要一个搜索结果的页面。我的蜘蛛总是失败,因为这个长的动态URL class RadioSpider(CrawlSpider): name = 'radio' allowed_domains = ['dashitradio.de'] start_urls = ["[http://www.dashitradio.de/nc/search-in-playlist.html?tx_wfqbe_pi1

我的Scrapy Spider有一个问题,它报告了一个不受支持的URL方案。 我想要一个搜索结果的页面。我的蜘蛛总是失败,因为这个长的动态URL

class RadioSpider(CrawlSpider):
    name = 'radio'
    allowed_domains = ['dashitradio.de']
    start_urls = ["[http://www.dashitradio.de/nc/search-in-playlist.html?tx_wfqbe_pi1%5BSTART%5D=2013-06-17%2006:00&tx_wfqbe_pi1%5BEND%5D=2013-06-21%2018:00&tx_wfqbe_pi1%5Bsubmit%5D=Suchen&tx_wfqbe_pi1%5Bshowpage%5D%5B3%5D=1][1]"]
    rules = (
        Rule(SgmlLinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        hxs = HtmlXPathSelector(response)
        i = RadioItem()

        i['title'] = hxs.select("//*[@id='playlist-results']/table//tr[1]/td[1]/text()").extract()
        i['interpret'] = hxs.select("//*[@id='playlist-results']/table[1]//tr/td[2]/text()").extract()
        i['date'] = hxs.select("//*[@id='playlist-results']/table//tr[1]/td[3]/text()").extract()

        return i
如果我在ScrapyShell控制台中运行它,除了URL之外,它只能使用倒逗号,比如URL

如何让Scrapy在我的爬行器中接受此字符串作为单个URL?

您的“开始URL”设置不正确:[开头和][1]结尾将其设置为无效URL

我已经根据您的评论更新了蜘蛛的代码:

from scrapy.item import Item, Field
from scrapy.selector import HtmlXPathSelector
from scrapy.spider import BaseSpider


class RadioItem(Item):
    title = Field()
    interpret = Field()
    date = Field()


class RadioSpider(BaseSpider):
    name = 'radio'
    allowed_domains = ['dashitradio.de']
    start_urls = ["http://www.dashitradio.de/nc/search-in-playlist.html?tx_wfqbe_pi1%5BSTART%5D=2013-06-17%2006:00&tx_wfqbe_pi1%5BEND%5D=2013-06-21%2018:00&tx_wfqbe_pi1%5Bsubmit%5D=Suchen&tx_wfqbe_pi1%5Bshowpage%5D%5B3%5D=1"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)

        rows = hxs.select("//div[@id='playlist-results']/table/tbody/tr")
        for row in rows:
            item = RadioItem()

            item['title'] = row.select(".//td[1]/text()").extract()[0]
            item['interpret'] = row.select(".//td[2]/text()").extract()[0]
            item['date'] = row.select(".//td[3]/text()").extract()[0]

            yield item
将其保存到my_spider.py并通过runspider运行:

您将在output.json中看到:


希望这能有所帮助。

如果我确实删除了URL两侧的“.json”,我将得到一个清晰的.json文件,其中没有任何数据。我在这里发帖之前试过了。我想,如果没有“scrapy”,请不要接受我的长url,并将其拆分为多个部分,因为其中包含许多参数。它应该可以工作。你能发布你的爬行器的全部代码吗?我编辑了它,我应该发布其余的代码吗?我在那里也定义了xpath元素?那太好了,这样我就可以重现这个问题了。谢谢,非常感谢。我是scrapy&python的新手。首先,我只是想尝试从结果的第一页中删除一个跟踪列表,并将其放入一个json文件中。
scrapy runspider my_spider.py -o output.json
{"date": "2013-06-21 17:48:00", "interpret": "MUMFORD & SONS", "title": "I WILL WAIT"}
{"date": "2013-06-21 17:44:00", "interpret": "TASMIN ARCHER", "title": "SLEEPING SATELLITE"}
{"date": "2013-06-21 17:40:03", "interpret": "ROBIN THICKE", "title": "BLURRED LINES (feat. T.I. & PHARRELL)"}
{"date": "2013-06-21 17:35:02", "interpret": "TINA TURNER", "title": "TWO PEOPLE"}
{"date": "2013-06-21 17:31:02", "interpret": "BON JOVI", "title": "WHAT ABOUT NOW"}
{"date": "2013-06-21 17:28:03", "interpret": "ROXETTE", "title": "SHE'S GOT NOTHING ON (BUT THE RADIO)"}
{"date": "2013-06-21 17:18:01", "interpret": "GNARLS BARKLEY", "title": "CRAZY"}
{"date": "2013-06-21 17:08:01", "interpret": "FLO RIDA", "title": "WHISTLE"}
{"date": "2013-06-21 17:05:03", "interpret": "WHAM", "title": "WAKE ME UP BEFORE YOU GO GO"}
{"date": "2013-06-21 17:00:03", "interpret": "P!NK FEAT. NATE RUESS", "title": "JUST GIVE ME A REASON"}
{"date": "2013-06-21 16:48:01", "interpret": "SHAKIRA", "title": "WHENEVER, WHEREVER"}
{"date": "2013-06-21 16:44:00", "interpret": "ALPHAVILLE", "title": "BIG IN JAPAN"}
{"date": "2013-06-21 16:40:01", "interpret": "XAVIER NAIDOO", "title": "BEI MEINER SEELE"}
{"date": "2013-06-21 16:36:02", "interpret": "SANTANA", "title": "SMOOTH"}
{"date": "2013-06-21 16:32:01", "interpret": "OLLY MURS", "title": "ARMY OF TWO"}