Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从数据帧中删除停止字_Python_Pandas_Nltk - Fatal编程技术网

Python 从数据帧中删除停止字

Python 从数据帧中删除停止字,python,pandas,nltk,Python,Pandas,Nltk,我有下面的脚本&在最后一行中,我试图从名为“response”的列中的字符串中删除stopwords 问题是,不是“有点恼火”变成“有点恼火”,它实际上连字母都掉了——所以,有点恼火会变成有点不高兴。因为“a”是一个停止词 有人能给我建议吗 import pandas as pd from textblob import TextBlob import numpy as np import os import nltk nltk.download('stopw

我有下面的脚本&在最后一行中,我试图从名为“response”的列中的字符串中删除stopwords

问题是,不是“有点恼火”变成“有点恼火”,它实际上连字母都掉了——所以,有点恼火会变成有点不高兴。因为“a”是一个停止词

有人能给我建议吗

   import pandas as pd
   from textblob import TextBlob
   import numpy as np
   import os
   import nltk
   nltk.download('stopwords')
   from nltk.corpus import stopwords
   stop = stopwords.words('english')

   path = 'Desktop/fanbase2.csv'
   df = pd.read_csv(path, delimiter=',', header='infer', encoding = "ISO-8859-1")
   #remove punctuation
   df['response'] = df.response.str.replace("[^\w\s]", "")
   #make it all lower case
   df['response'] = df.response.apply(lambda x: x.lower())
   #Handle strange character in source
   df['response'] = df.response.str.replace("‰Ûª", "''")

   df['response'] = df['response'].apply(lambda x: [item for item in x if item not in stop])
在列表理解(最后一行)中,您正在对照停止词检查每个单词,如果该单词不在停止词中,您将返回它。但是你正在传递一个字符串给它。您需要拆分字符串以使LC正常工作

df = pd.DataFrame({'response':['This is one type of response!', 'Though i like this one more', 'and yet what is that?']})

df['response'] = df.response.str.replace("[^\w\s]", "").str.lower()

df['response'] = df['response'].apply(lambda x: [item for item in x.split() if item not in stop])


0    [one, type, response]
1      [though, like, one]
2                    [yet]
如果要以字符串形式返回响应,请将最后一行更改为

df['response'] = df['response'].apply(lambda x: ' '.join([item for item in x.split() if item not in stop]))

0    one type response
1      though like one
2                  yet

谢谢,这很好用!很抱歉提出这个愚蠢的问题,但是.split()如何知道在没有明确定义的情况下在空格处拆分?拆分的默认分隔符是空格。如果字符串由其他分隔符分隔,则需要指定该分隔符,但句子中很少出现这种情况:)谢谢您的帮助!:):)