';浮动';类型错误Python,pandas
使用unicode字符串数据(dtype对象)在数据帧中的列上迭代时,出现以下错误:';浮动';类型错误Python,pandas,python,string,machine-learning,scikit-learn,typeerror,Python,String,Machine Learning,Scikit Learn,Typeerror,使用unicode字符串数据(dtype对象)在数据帧中的列上迭代时,出现以下错误: in text_pre_processing(text) 2 # removing punctuation 3 #text = text1(r'\n',' ', regex=True) ----> 4 text1 = [char for char in text if char not in string.punctuation] 5 text1 = ''.join(text1)
in text_pre_processing(text)
2 # removing punctuation
3 #text = text1(r'\n',' ', regex=True)
----> 4 text1 = [char for char in text if char not in string.punctuation]
5 text1 = ''.join(text1)
**TypeError: 'float' object is not iterable**
使用的功能
def text_pre_processing(text):
# removing punctuation
#text1 = text1(r'\n',' ', regex=True)
text1 = [char for char in str(text) if char not in string.punctuation]
text1 = ''.join(text1)
# removing all the stop words from corpus
#return text.split()
return[word for word in text1.split() if word not in stopwords.words('english')]
我试图查看输入函数的列im是否有任何浮点值(只有浮点值的句子),但没有这样做,因为“pandas”将alfa numeric和alpha值视为数据类型“object”,显式类型转换无法工作
有人知道出了什么问题吗
我将此函数用作naivebayes算法分析器的一部分
数据:
第1列是索引
Column2
this is a good movie...#
this is a bad movie $....
this #movie was good ;) but some scenes were exaggerating
预期产出:
[this, good, movie]
[this, bad, movie ]
[this, movie, good, some, scenes, were, exaggerating]
您需要将浮点数转换为字符串:
>>> str(3.14159)
'3.14159'
您可以将
文本
包装回字符串:[char for char in str(text),如果char不在string中。标点符号]
为什么要在列上迭代?我闻到一个XY问题。请显示您的数据和预期输出。就性能而言,迭代是数据帧所能做的最糟糕的事情。我99%确信pd.Series.str.replace
更适合您的问题。@hoefling我尝试了这个方法,但仍然不起作用……并且还尝试显式地将列强制转换为字符串D1['column']=D1['column'].astype(str)@cᴏʟᴅsᴘᴇᴇᴅ 我对这个问题做了一些修改,希望现在问题清楚了。