Python 刮痧:如何清洁反应?

Python 刮痧:如何清洁反应?,python,scrapy,scrapy-pipeline,Python,Scrapy,Scrapy Pipeline,这是我的代码片段。我正在尝试使用Scrapy刮取一个网站,然后将数据存储在Elasticsearch中进行索引 def parse(self, response): for news in response.xpath('head'): yield { 'pagetype': news.xpath('//meta[@name="pagetype"]/@content').extract(), 'description': n

这是我的代码片段。我正在尝试使用Scrapy刮取一个网站,然后将数据存储在Elasticsearch中进行索引

def parse(self, response):
    for news in response.xpath('head'):
        yield {
            'pagetype': news.xpath('//meta[@name="pagetype"]/@content').extract(),
            'description': news.xpath('//div[@class="module__content"]/*/node()/text()').extract(),
              }
现在我的问题是保存在“description”字段中的值

    [u'\n              \n              ', u'"For\n              many of us what we eat on Christmas day isn\'t what we would usually consume and\n              that\u2019s perfectly ok," Dr said.', u'"However\n              it is not uncommon for festive season celebrations to begin in November and\n              continue well in to the New Year.', u'"So\n              if health is on the agenda, being mindful about what we put into our bodies\n              with a balanced approach, throughout the whole festive season, is important."', u"Dr\n              , a lecturer at School\n              Sciences, said balancing fresh, healthy food with being physically active was a\n              good start.", u'"Whatever\n              the celebration, try to limit processed foods, often high in fat, sugar and\n              salt," she said.', u'"Taking\n              time during holidays to prepare food and make the most of fresh ingredients is\n              often a much healthier option than relying on convenience foods and take away.', u'"Being\n              mindful about going back for seconds is important too.\xa0 We don\u2019t need to eat until we feel\n              uncomfortable and eating the foods we enjoy doesn\'t necessarily mean we need to\n              eat copious amounts."', u"Dr\n             own healthy tips and substitutes for the Christmas season\n              include:", u'But\n              just because Dr  is a dietitian, doesn\u2019t mean she doesn\u2019t enjoy a\n              Christmas treat or two.', u'"I\n              would have to say my sister in law\'s homemade rocky road is my favourite\n              festive treat. She makes it every Christmas day and it gets better each year," she\n              said.', u'"I\n              also enjoy a summer cocktail every so often during the festive season and a\n              mojito would be one of my favourites on Christmas day. We make it with extra\n              mint from the garden which is a nice, fresh addition.', u'"Rather\n              than focusing on food avoidance, moderation is the best approach.', u'"There\n              are definitely some more healthy choices and some less healthy options when it\n              comes to the typical Christmas day menu, but it\'s more important to be mindful\n              of a healthy, balanced diet throughout the festive period, rather than avoiding\n              specific foods on one day of the year."', u'\n                ', u'\n              \n                ', u'\n                ', u'\n              \n                ', u'\n              ', u'\n                ', u'\n                        ', u'\n                        ', u'\n                        ', u'\n                    ', u'\n            ', u'Related News', u'\n          ', u'\n        ', u'\n          ', u'\n        ', u'\n          ', u'\n        ', u'Search for related news']
有很多空格、换行符和“u”字母

如何进一步处理此代码,使其仅包含普通文本,不包含额外的空格、换行符(\n)代码和“u”字母

我读到BeautifulSoup与Scrapy配合得很好,但我找不到任何关于如何将Scrapy与BeautifulSoup整合的例子。我也愿意使用任何其他方法。非常感谢您的帮助


谢谢

您可以使用以下方法从列表中的字符串中删除空格和换行符:

其中
list\u of_strings
是您作为示例给出的字符串列表

关于“u字母”,你不应该真的担心它们。
它们只是表示字符串采用unicode编码。例如,请参见关于此问题的。

您可以使用以下方法从列表中的字符串中删除空格和换行符:

其中
list\u of_strings
是您作为示例给出的字符串列表

关于“u字母”,你不应该真的担心它们。
它们只是表示字符串采用unicode编码。例如,请参见关于此问题的。

相关:
u
是列表中唯一以unicode格式显示的信息。如果从列表中打印单个元素,则会看到没有
u
的文本。要清楚,您只想从这些字符串中删除换行符和空格?您好,是的。相关:
u
是列表中只有unicode文本的信息。如果您打印列表中的单个元素,那么您会看到没有
u
的文本。为了清楚起见,您只想从这些字符串中删除换行符和空格?您好,glS,是的。谢谢,我该如何使用它?我把这个用软皮做的谢谢,我怎么用这个?我在Scrapy shell“”中运行了此操作。join(myString.split())并获得了此错误AttributeError:“list”对象没有属性“split”。如果将问题中输入的字符串列表保存为变量
list\u of\u string
,只需运行上面的行,并使用去掉空格和换行符的元素获得相同的列表,就可以了。只需要稍微调整一下。。。。现在它在某些地方显示
?撇号也改为\u2019要过滤掉空值,可以使用内置的筛选函数(),将bool作为第一个参数传递。谢谢,我该如何使用它?我把这个用软皮做的谢谢,我怎么用这个?我在Scrapy shell“”中运行了此操作。join(myString.split())并获得了此错误AttributeError:“list”对象没有属性“split”。如果将问题中输入的字符串列表保存为变量
list\u of\u string
,只需运行上面的行,并使用去掉空格和换行符的元素获得相同的列表,就可以了。只需要稍微调整一下。。。。现在它在某些地方显示
?撇号也改为\u2019,要过滤掉空值,可以使用内置的过滤函数(),将bool作为第一个参数传递。
[' '.join(item.split()) for item in list_of_strings]