Python 3.x TypeError:无序类型:str()<;浮动()

Python 3.x TypeError:无序类型:str()<;浮动(),python-3.x,machine-learning,nlp,tfidfvectorizer,Python 3.x,Machine Learning,Nlp,Tfidfvectorizer,我试图在我的数据上运行一个简单的ML分类器,但我得到以下错误。我是初学者,所以请解释我的原因,以及当你提供解决方案。谢谢 “TypeError:无序类型:str()3文本分类器。拟合(X\U序列,y\U序列) C:\miniconda3\envs\conda\lib\site- 软件包\sklearn\employee\forest.py适合(self、X、y、, 样品(单位重量) 276个self.n_输出u=y.shape[1] 277 -->278 y,扩展类重量= 自我验证等级重量(y)

我试图在我的数据上运行一个简单的ML分类器,但我得到以下错误。我是初学者,所以请解释我的原因,以及当你提供解决方案。谢谢 “TypeError:无序类型:str() 下面是我的代码

import numpy as np 
import pandas as pd 
import re  
import nltk 
nltk.download('stopwords')  
from nltk.corpus import stopwords 
tweets = pd.read_csv("C:\\Users\\data.csv")
tweets.shape
(4787,2)

(38291710)

(3829,)

(47871710)

下面是我运行上述代码时遇到的错误


TypeError回溯(最近一次调用)
在()
1来自sklearn.employ导入随机林分类器
2文本分类器=随机森林分类器(n\u估计器=100,
随机_状态=0)
---->3文本分类器。拟合(X\U序列,y\U序列)
C:\miniconda3\envs\conda\lib\site-
软件包\sklearn\employee\forest.py适合(self、X、y、,
样品(单位重量)
276个self.n_输出u=y.shape[1]
277
-->278 y,扩展类重量=
自我验证等级重量(y)
279
280如果getattr(y,“数据类型”,无)!=双倍还是不双倍
y、 标志.连续:
C:\miniconda3\envs\conda\lib\site-
中的packages\sklearn\employee\forest.py
_验证y类重量(自身,y)
476
477 def_验证_等级_重量(自身,y):
-->478检查分类目标(y)
479
480 y=np.拷贝(y)
C:\miniconda3\envs\conda\lib\site packages\sklearn\utils\multiclass.py in check\u classification\u targets(y)
166y:阵列状
167     """
-->168 y_类型=_目标的类型(y)
169如果y_类型不在['binary'、'multiclass'、'multiclass multioutput'中,
170“多标签指示器”、“多标签序列”]:
C:\miniconda3\envs\conda\lib\site packages\sklearn\utils\multiclass.py,类型为\u目标(y)
285返回“连续”+后缀
286
-->287如果(len(np.unique(y))>2)或(y.ndim>=2且len(y[0])>1):
288返回'multiclass'+后缀#[1,2,3]或[[1,2,3]]或[[1,2]]
289其他:
唯一(*args,**kwargs)
C:\miniconda3\envs\conda\lib\site packages\numpy\lib\arraysetops.py,唯一(ar、返回索引、返回逆、返回计数、轴)
261 ar=np.asanyarray(ar)
262如果轴为无:
-->263 ret=_unique1d(ar、返回索引、返回逆、返回计数)
264返回\u解包\u元组(ret)
265
C:\miniconda3\envs\conda\lib\site packages\numpy\lib\arraysetops.py in\u unique1d(ar、返回索引、返回逆、返回计数)
309 aux=ar[perm]
310其他:
-->311 ar.sort()
312 aux=ar
313掩码=np.empty(辅助形状,数据类型=np.bool\ux)
TypeError:无序类型:str()
可能X\u列或y\u列中的某些值为null或'nan',这就是为什么尝试比较str()有人吗?谢谢回答。我是否应该手动检查并删除任何空值/nan值?有其他方法吗?查看错误,我会说没有其他方法,因为问题是在您将数据拟合到模型中之后出现的,因此您无法检查其中的任何内容以解析字符串或其他任何东西。无论如何,在训练任何模型之前检查数据集是一种很好的做法,以查看是否存在错误或需要清理的内容。
 X = tweets.iloc[:, 0].values  
 y = tweets.iloc[:, 1].values

processed_tweets = []

for tweet in range(0, len(X)):  
# Remove all the special characters
processed_tweet = re.sub(r'\W', ' ', str(X[tweet]))

# remove all single characters
processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet)

# Remove single characters from the start
processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet) 

# Substituting multiple spaces with single space
processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I)

# Removing prefixed 'b'
processed_tweet = re.sub(r'^b\s+', '', processed_tweet)

# Converting to Lowercase
processed_tweet = processed_tweet.lower()

processed_tweets.append(processed_tweet)


from sklearn.feature_extraction.text import TfidfVectorizer  
tfidfconverter = TfidfVectorizer(max_features=2000, min_df=5, max_df=0.7, 
stop_words=stopwords.words('english'))  
X = tfidfconverter.fit_transform(processed_tweets).toarray()


from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)


X_train.shape
y_train.shape
X.shape
from sklearn.ensemble import RandomForestClassifier
text_classifier = RandomForestClassifier(n_estimators=100, random_state=0)  
text_classifier.fit(X_train, y_train)
        TypeError                                 Traceback (most recent call last)
            <ipython-input-24-7c5c1beb13e6> in <module>()
            1 from sklearn.ensemble import RandomForestClassifier
           2 text_classifier = RandomForestClassifier(n_estimators=100, 
           random_state=0)
           ----> 3 text_classifier.fit(X_train, y_train)

               C:\miniconda3\envs\conda\lib\site- 
           packages\sklearn\ensemble\forest.py in fit(self, X, y, 
          sample_weight)
            276         self.n_outputs_ = y.shape[1]
              277 
            --> 278         y, expanded_class_weight = 
            self._validate_y_class_weight(y)
           279 
            280         if getattr(y, "dtype", None) != DOUBLE or not 
            y.flags.contiguous:

            C:\miniconda3\envs\conda\lib\site- 
            packages\sklearn\ensemble\forest.py in 
           _validate_y_class_weight(self, y)
             476 
    477     def _validate_y_class_weight(self, y):
--> 478         check_classification_targets(y)
    479 
    480         y = np.copy(y)

C:\miniconda3\envs\conda\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
    166     y : array-like
    167     """
--> 168     y_type = type_of_target(y)
    169     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    170                       'multilabel-indicator', 'multilabel-sequences']:

C:\miniconda3\envs\conda\lib\site-packages\sklearn\utils\multiclass.py in type_of_target(y)
    285         return 'continuous' + suffix
    286 
--> 287     if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
    288         return 'multiclass' + suffix  # [1, 2, 3] or [[1., 2., 3]] or [[1, 2]]
    289     else:

<__array_function__ internals> in unique(*args, **kwargs)

C:\miniconda3\envs\conda\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
    261     ar = np.asanyarray(ar)
    262     if axis is None:
--> 263         ret = _unique1d(ar, return_index, return_inverse, return_counts)
    264         return _unpack_tuple(ret)
    265 

C:\miniconda3\envs\conda\lib\site-packages\numpy\lib\arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
    309         aux = ar[perm]
    310     else:
--> 311         ar.sort()
    312         aux = ar
    313     mask = np.empty(aux.shape, dtype=np.bool_)

TypeError: unorderable types: str() < float()