Python 在scrapy上拆分并保存文本字符串_Python_Scrapy

Python 在scrapy上拆分并保存文本字符串

python scrapy

Python 在scrapy上拆分并保存文本字符串,python,scrapy,Python,Scrapy,我需要从字符串中拆分一个子字符串，正是这个源文本：文章发表于：教程我想删除“发表于：”的文章，只留下辅导，所以我可以保存这个我尝试： category = items[1] category.split('Article published on:','') 和 for p in articles: bodytext = p.xpath('.//text()').extract() joined_text = ''

我需要从字符串中拆分一个子字符串，正是这个源文本：

文章发表于：教程

我想删除“发表于：”的文章，只留下

辅导

，所以我可以保存这个我尝试：

category = items[1]
category.split('Article published on:','')

和

for p in articles:
            bodytext = p.xpath('.//text()').extract()
            joined_text = ''
            # loop in categories
            for each_text in text:
                stripped_text = each_text.strip()
                if stripped_text:
                    # all the categories together
                    joined_text += ' ' + stripped_text
            joined_text = joined_text.split('Article published on:','')
    items.append(joined_text)
            if not is_phrase:
                title = items[0]
                category = items[1]
                print('title = ', title)
                print('category = ', category)

这不管用，我错过了什么

此代码错误：

TypeError:“str”对象不能解释为整数

您可能只是忘记分配结果：

category=category.replace（'Article published on:'，''）

另外，您似乎打算使用

replace

而不是。后者也可以通过以下方式发挥作用：

category=category.split（'：'）[1]

您的输入是什么？具体出了什么问题？我想我没有正确地理解您，但您正在寻找

类别。拆分（'：'）[1]

？我无法修改文本，如果我打印，我会得到文本“文章发布日期：教程”，而不做任何修改（我想删除“文章发布日期”）你能发布包含打印语句的代码吗？谢谢！问题是我在句子开头有一个+的unicode，在源代码的html中不可见（用相同的主题颜色绘制）