Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:如何将字符串附加到一个不完整的列表项?_Python_List_Class_Parsing_Scrapy - Fatal编程技术网

Python:如何将字符串附加到一个不完整的列表项?

Python:如何将字符串附加到一个不完整的列表项?,python,list,class,parsing,scrapy,Python,List,Class,Parsing,Scrapy,我正在抓取一组url,但它们都缺少url的基础,因此我想在每个抓取的url后面附加“start\u url”作为基础 蜘蛛类: class MySpider(BaseSpider): name = "teslanews" allowed_domains = ["teslamotors.com"] start_urls = ["http://www.teslamotors.com/blog"] def parse(self, response):

我正在抓取一组url,但它们都缺少url的基础,因此我想在每个抓取的url后面附加“start\u url”作为基础

蜘蛛类:

class MySpider(BaseSpider):
    name = "teslanews"
    allowed_domains = ["teslamotors.com"]
    start_urls = ["http://www.teslamotors.com/blog"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        updates = hxs.xpath('//div[@class="blog-wrapper no-image"]')

        items = []
        for article in updates:
            item = TeslanewsItem()
            item["date"] =  article.xpath('./div/span/span/text()').extract()
            item["title"] = article.xpath('./h2/a/text()').extract()
            item["url"] = article.xpath('./h2/a/@href').extract()
            items.append(item)
        return items
我不能用
base=”做一个简单的
item[“url”]=article.xpath('./h2/a/@href').extract()+base
http://www.teslamotors.com“

因为这会将基添加到末尾,并且由于处于for循环中,所以它会逐字母执行,每个字母都用逗号分隔


我对Scrapy比较陌生,所以我不知道该怎么做。

你不能改为
item[“url”]=base+article.xpath…
吗?不。结果是,
h,t,t,p,:,/,/,w,w,,,t,e,s,l,a,
你就知道了。这是因为它在for循环中,所以它是一个字符接一个字符的,也不能以相同的格式添加到for循环之外。什么也没有发生。我的意思是执行以下代码:
item[“url”]=base article.xpath('./h2/a/@href').extract()[0]
。确定它不起作用吗?extract()返回一个列表,因此必须获取列表的第一个元素!当然,我错过了那里的
+
,哈哈。不客气;)@查尔斯·沃森有帮助吗?
from scrapy.spider import BaseSpider
from urlparse import urljoin


class MySpider(BaseSpider):
    name = "teslanews"
    allowed_domains = ["teslamotors.com"]

    base = "http://www.teslamotors.com/blog"

    start_urls = ["http://www.teslamotors.com/blog"]

    def parse(self, response):

        updates = response.xpath('//div[@class="blog-wrapper no-image"]')

        items = []
        for article in updates:
            item = TeslanewsItem()
            item["date"] = article.xpath('./div/span/span/text()').extract()
            item["title"] = article.xpath('./h2/a/text()').extract()
            item['url'] = urljoin(self.base, ''.join(article.xpath('./h2/a/@href').extract()))

        return items