如何在python scrapy中删除数组中项目的第一个字符_Python_Scrapy

如何在python scrapy中删除数组中项目的第一个字符

python scrapy

如何在python scrapy中删除数组中项目的第一个字符,python,scrapy,Python,Scrapy,我正在尝试删除数组中某个项目的前7个字符，更具体地说，我正在尝试删除mailto，以便它只显示电子邮件我认为使用[:7]就可以了，但是python忽略了这个请求有什么建议吗 def businessprofile(self, response): for business in response.css('header#main-header'): item = Item() item['business_name'] = business.css('

我正在尝试删除数组中某个项目的前7个字符，更具体地说，我正在尝试删除mailto，以便它只显示电子邮件

我认为使用[:7]就可以了，但是python忽略了这个请求

有什么建议吗

def businessprofile(self, response):
    for business in response.css('header#main-header'):
        item = Item()
        item['business_name'] = business.css('div.sales-info h1::text').extract()
        item['website'] = business.css('a.secondary-btn.website-link::attr(href)').extract()
        # i want to remove the first 7 characters "mailto:", but not sure how ? i made an attempt
        item['email'] = business.css('a.email-business::attr(href)').extract()[7:]
        item['phonenumber'] = business.css('p.phone::text').extract_first()
        for x in item['business_name']:
            #new code here, call to self.seen_business_names
            if x not in self.seen_business_names:
                if item['business_name']:
                    if item['phonenumber']:
                        if item['email']:                               
                            yield item
                            self.seen_business_names.append(x)

这就是我需要删除字符的地方

   item['email'] = business.css('a.email-business::attr(href)').extract()[7:]

您需要使用[7:]而不是[：7]

语法为[：]，省略时将自动从字符串的开头或结尾开始

例如：

val = "mailto:abc@abc.de"
mailto = val[:7] # from first charater to 7th = 'mailto:'
email = val[7:] # 8th character to the end.

计数从0开始：

a = "0123456789"
a[7:]
# '789'

所以你可能需要

a[8:]
# '89'

显然，business.css'a.email-business:：attrref'。extract返回一个列表。您需要从列表中的项目中删除mailto:

s = business.css('a.email-business::attr(href)').extract()
item['email'] = [item[7:] for item in s]
# ['businessname@gmail.com']

或

你能提供样本输入吗？我想你可以用re来解决这个问题。我注意到，模式匹配甚至可能没有必要，现在我重读了这个问题。你能详细说明一下吗？让我更新帖子并指出我需要修复的确切位置。什么样的结构才有业务。css'a.email-business:：attrref'。extract have？它提取电子邮件并将邮件输出到：myemail@barnowlrocks.comYour代码似乎是有效的。您在代码中的何处测试没有收到正确的字符串。为什么不添加一个print语句printbusiness.css'a.email-business:：attrref'。提取[7:]，看看是否打印出正确的结果。如果是这样的话：Item类做什么。请在原始问题中提供更多信息和细节，而不是作为评论。否则，我真的很难帮助你。我相信这实际上是描述中的一个输入错误。这段代码与我试图实现的目标不相关？我不想将数组项存储在变量中，您是否有其他方法来执行此操作？能否为我们提供良好的输入数据。例如，business.css'a.email-business:：attrref'。准确提取返回值是什么？它是一根绳子吗？没有它，我们就无法回答这个问题。请将信息放在您的原始问题中。它会像这样为ex-mailto提取电子邮件：myemail@barnowlrocks.comi要将其保持为此格式项['email']=business.css'a.email-business:：attrref'。摘录[7:]

s = business.css('a.email-business::attr(href)').extract()
item['email'] = [item.replace('mailto:', '') for item in s]
# ['businessname@gmail.com']