Python 3.x TypeError:无序类型：str（）<；浮动（）_Python 3.x_Machine Learning_Nlp_Tfidfvectorizer

Python 3.x TypeError:无序类型：str（）<；浮动（）

python-3.x machine-learning nlp

Python 3.x TypeError:无序类型：str（）<；浮动（）,python-3.x,machine-learning,nlp,tfidfvectorizer,Python 3.x,Machine Learning,Nlp,Tfidfvectorizer,我试图在我的数据上运行一个简单的ML分类器，但我得到以下错误。我是初学者，所以请解释我的原因，以及当你提供解决方案。谢谢 “TypeError:无序类型：str（）3文本分类器。拟合（X\U序列，y\U序列） C:\miniconda3\envs\conda\lib\site- 软件包\sklearn\employee\forest.py适合（self、X、y、，样品（单位重量） 276个self.n_输出u=y.shape[1] 277 -->278 y，扩展类重量= 自我验证等级重量（y）

我试图在我的数据上运行一个简单的ML分类器，但我得到以下错误。我是初学者，所以请解释我的原因，以及当你提供解决方案。谢谢 “TypeError:无序类型：str（）下面是我的代码

import numpy as np 
import pandas as pd 
import re  
import nltk 
nltk.download('stopwords')  
from nltk.corpus import stopwords 
tweets = pd.read_csv("C:\\Users\\data.csv")
tweets.shape

（4787,2）

（38291710）

（3829，）

（47871710）

下面是我运行上述代码时遇到的错误

TypeError回溯（最近一次调用）
在（）
1来自sklearn.employ导入随机林分类器
2文本分类器=随机森林分类器（n\u估计器=100，
随机_状态=0）
---->3文本分类器。拟合（X\U序列，y\U序列）
C:\miniconda3\envs\conda\lib\site-
软件包\sklearn\employee\forest.py适合（self、X、y、，
样品（单位重量）
276个self.n_输出u=y.shape[1]
277
-->278 y，扩展类重量=
自我验证等级重量（y）
279
280如果getattr（y，“数据类型”，无）！=双倍还是不双倍
y、 标志.连续：
C:\miniconda3\envs\conda\lib\site-
中的packages\sklearn\employee\forest.py
_验证y类重量（自身，y）
476
477 def_验证_等级_重量（自身，y）：
-->478检查分类目标（y）
479
480 y=np.拷贝（y）
C:\miniconda3\envs\conda\lib\site packages\sklearn\utils\multiclass.py in check\u classification\u targets（y）
166y：阵列状
167     """
-->168 y_类型=_目标的类型（y）
169如果y_类型不在['binary'、'multiclass'、'multiclass multioutput'中，
170“多标签指示器”、“多标签序列”]：
C:\miniconda3\envs\conda\lib\site packages\sklearn\utils\multiclass.py，类型为\u目标（y）
285返回“连续”+后缀
286
-->287如果（len（np.unique（y））>2）或（y.ndim>=2且len（y[0]）>1）：
288返回'multiclass'+后缀#[1,2,3]或[[1,2,3]]或[[1,2]]
289其他：
唯一（*args，**kwargs）
C:\miniconda3\envs\conda\lib\site packages\numpy\lib\arraysetops.py，唯一（ar、返回索引、返回逆、返回计数、轴）
261 ar=np.asanyarray（ar）
262如果轴为无：
-->263 ret=_unique1d（ar、返回索引、返回逆、返回计数）
264返回\u解包\u元组（ret）
265
C:\miniconda3\envs\conda\lib\site packages\numpy\lib\arraysetops.py in\u unique1d（ar、返回索引、返回逆、返回计数）
309 aux=ar[perm]
310其他：
-->311 ar.sort（）
312 aux=ar
313掩码=np.empty（辅助形状，数据类型=np.bool\ux）
TypeError:无序类型：str（）

可能X\u列或y\u列中的某些值为null或'nan'，这就是为什么尝试比较str（）有人吗？谢谢回答。我是否应该手动检查并删除任何空值/nan值？有其他方法吗？查看错误，我会说没有其他方法，因为问题是在您将数据拟合到模型中之后出现的，因此您无法检查其中的任何内容以解析字符串或其他任何东西。无论如何，在训练任何模型之前检查数据集是一种很好的做法，以查看是否存在错误或需要清理的内容。

 X = tweets.iloc[:, 0].values  
 y = tweets.iloc[:, 1].values

processed_tweets = []

for tweet in range(0, len(X)):  
# Remove all the special characters
processed_tweet = re.sub(r'\W', ' ', str(X[tweet]))

# remove all single characters
processed_tweet = re.sub(r'\s+[a-zA-Z]\s+', ' ', processed_tweet)

# Remove single characters from the start
processed_tweet = re.sub(r'\^[a-zA-Z]\s+', ' ', processed_tweet) 

# Substituting multiple spaces with single space
processed_tweet= re.sub(r'\s+', ' ', processed_tweet, flags=re.I)

# Removing prefixed 'b'
processed_tweet = re.sub(r'^b\s+', '', processed_tweet)

# Converting to Lowercase
processed_tweet = processed_tweet.lower()

processed_tweets.append(processed_tweet)


from sklearn.feature_extraction.text import TfidfVectorizer  
tfidfconverter = TfidfVectorizer(max_features=2000, min_df=5, max_df=0.7, 
stop_words=stopwords.words('english'))  
X = tfidfconverter.fit_transform(processed_tweets).toarray()


from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)


X_train.shape

y_train.shape

X.shape

from sklearn.ensemble import RandomForestClassifier
text_classifier = RandomForestClassifier(n_estimators=100, random_state=0)  
text_classifier.fit(X_train, y_train)

        TypeError                                 Traceback (most recent call last)
            <ipython-input-24-7c5c1beb13e6> in <module>()
            1 from sklearn.ensemble import RandomForestClassifier
           2 text_classifier = RandomForestClassifier(n_estimators=100, 
           random_state=0)
           ----> 3 text_classifier.fit(X_train, y_train)

               C:\miniconda3\envs\conda\lib\site- 
           packages\sklearn\ensemble\forest.py in fit(self, X, y, 
          sample_weight)
            276         self.n_outputs_ = y.shape[1]
              277 
            --> 278         y, expanded_class_weight = 
            self._validate_y_class_weight(y)
           279 
            280         if getattr(y, "dtype", None) != DOUBLE or not 
            y.flags.contiguous:

            C:\miniconda3\envs\conda\lib\site- 
            packages\sklearn\ensemble\forest.py in 
           _validate_y_class_weight(self, y)
             476 
    477     def _validate_y_class_weight(self, y):
--> 478         check_classification_targets(y)
    479 
    480         y = np.copy(y)

C:\miniconda3\envs\conda\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
    166     y : array-like
    167     """
--> 168     y_type = type_of_target(y)
    169     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    170                       'multilabel-indicator', 'multilabel-sequences']:

C:\miniconda3\envs\conda\lib\site-packages\sklearn\utils\multiclass.py in type_of_target(y)
    285         return 'continuous' + suffix
    286 
--> 287     if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
    288         return 'multiclass' + suffix  # [1, 2, 3] or [[1., 2., 3]] or [[1, 2]]
    289     else:

<__array_function__ internals> in unique(*args, **kwargs)

C:\miniconda3\envs\conda\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
    261     ar = np.asanyarray(ar)
    262     if axis is None:
--> 263         ret = _unique1d(ar, return_index, return_inverse, return_counts)
    264         return _unpack_tuple(ret)
    265 

C:\miniconda3\envs\conda\lib\site-packages\numpy\lib\arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
    309         aux = ar[perm]
    310     else:
--> 311         ar.sort()
    312         aux = ar
    313     mask = np.empty(aux.shape, dtype=np.bool_)

TypeError: unorderable types: str() < float()