Python正则表达式错误：TypeError:应为字符串或缓冲区_Python_Regex

Python正则表达式错误：TypeError:应为字符串或缓冲区

python regex

Python正则表达式错误：TypeError:应为字符串或缓冲区,python,regex,Python,Regex,python新手。喜欢预处理一些数据并在后置字后连接它们，这个函数以一个字符串作为输入，输出是一个经过预处理的字符串 def message_to_words(message ): letters = re.sub("[^a-zA-Z]", " ", message ) words = letters.lower().split() stops = set(stopwords.words(

python新手。喜欢预处理一些数据并在后置字后连接它们，这个函数以一个字符串作为输入，输出是一个经过预处理的字符串

def message_to_words(message ):
        letters = re.sub("[^a-zA-Z]", " ", message ) 
        words = letters.lower().split()                             
        stops = set(stopwords.words("english"))                  
        meaningful_words = [w for w in words if not w in stops]  
        return( " ".join( meaningful_words ))

当我调用函数时

clean_messages = []
for i in xrange(0, df["Message"].size):
        clean_messages.append( message_to_words( df["Message"][i] ) )

我得到这个错误

TypeError                                 Traceback (most recent call last)
<ipython-input-156-061399cb4dfd> in <module>()
      3 for i in xrange(0, df["Message"].size):
----> 5         clean_messages.append( message_to_words( df["Message"][i] ) )

---> 12     letters = re.sub("[^a-zA-Z]", " ", message )
..../python2.7/re.pyc in sub(pattern, repl, string, count, flags)
    153     a callable, it's passed the match object and must return
    154     a replacement string to be used."""
--> 155     return _compile(pattern, flags).sub(repl, string, count)
    156 
    157 def subn(pattern, repl, string, count=0, flags=0):

TypeError: expected string or buffer

TypeError回溯（最近一次调用）
在（）
X范围内的i为3（0，df[“消息”]。大小）：
---->5清除消息。追加（消息到单词（df[“消息”][i]））
--->12个字母=re.sub（“[^a-zA-Z]”，“”，消息）
..../python2.7/re.pyc in sub（模式、repl、字符串、计数、标志）
153一个可调用的，它传递了match对象并且必须返回
154要使用的替换字符串。”“”
-->155返回编译（模式、标志）.sub（repl、字符串、计数）
156
157 def子网（模式、应答、字符串、计数=0、标志=0）：
TypeError:应为字符串或缓冲区

当数据行在500以内时，“print df[“Message”][i]”是一个字符串，代码没有错误，但是，当数据行增加到500以上时，“print df[“Message”][i]”是一个浮点数。让我困惑的是

上面代码中的

stopwords

是什么？在语句中

stops=set（stopwords.words（“英语”）

错误意味着df[”消息“][i]必须是字符串或缓冲区。您可以按打印类型（df[“Message”][i]）检查df[“Message”][i]的类型吗？stopwords是Python nltkprint df[“Message”][i]中的stopwords列表，它是一个浮点，而实际上它是一个字符串，通过传递（str（df[“Message”][i]））解决了问题，谢谢