Python kmeans.cluster()给出错误;类型错误:';浮动';“对象不可编辑”;关于在句子中使用单词嵌入(word2vec)
我试图使用kmeans对句子进行聚类 但是我没有为Python kmeans.cluster()给出错误;类型错误:';浮动';“对象不可编辑”;关于在句子中使用单词嵌入(word2vec),python,k-means,word2vec,Python,K Means,Word2vec,我试图使用kmeans对句子进行聚类 但是我没有为cluster()获取正确的输入类型 我尝试过使用列表Y和wordembedding创建的sent\u矢量器def,还尝试了dataframe版本的Y def sent_vectorizer(sent, model): #creates vectors for each tokenized sentence sent_vec =[] numw = 0 for w in sent: try:
cluster()
获取正确的输入类型
我尝试过使用列表Y和wordembedding
创建的sent\u矢量器
def,还尝试了dataframe
版本的Y
def sent_vectorizer(sent, model): #creates vectors for each tokenized sentence
sent_vec =[]
numw = 0
for w in sent:
try:
if numw == 0:
sent_vec = model[w]
else:
sent_vec = np.add(sent_vec, model[w]) #adds vectors of all words in a sentence over iterations
numw+=1 #counts the number of words in all sentences
except:
pass
return np.asarray(sent_vec) / numw
Y=[]
for sentence in all_words:
Y.append(sent_vectorizer(sentence, model))
print ("========================")
print (Y)
df_Y = pd.DataFrame(Y)
NUM_CLUSTERS=3
kclusterer = KMeansClusterer(NUM_CLUSTERS, distance=nltk.cluster.util.cosine_distance, repeats=25,avoid_empty_clusters=True)
assigned_clusters = kclusterer.cluster(df_Y, assign_clusters=True)
print (assigned_clusters)
所有单词都有一个标记化句子列表:
[[cloud]、[technologies]、[still]、[building]、[strong]、[foundation]、[game]、[changers]、[hyper]、[Convergend]、[technology]、[sd]、[wan]、[Ping]、[college]、[college]、[security]、[plane]、[Protection]、[college]、[data]、[analytics]、[customer]、[experience]、[improvements]、[ar]、[vr]等,['none','timeframe','longer','term','cloud','services','ai','technologies','game','changer'],['cloud','finance','integrated','ship','management'],['MicroService','based','api','platform','omni','Channel'],['moving','erp','cloud',['online','learning','open','Education','resources'],[‘在线’、‘计划’、‘满足’、‘需求’、‘今天’、‘学习者’、[‘自动化’、‘现有’、‘流程’]]
错误回溯:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-108-68f5fe386b54> in <module>
1 NUM_CLUSTERS=3
2 kclusterer = KMeansClusterer(NUM_CLUSTERS, distance=nltk.cluster.util.cosine_distance, repeats=25,avoid_empty_clusters=True)
----> 3 assigned_clusters = kclusterer.cluster(df_Y, assign_clusters=True)
4 print (assigned_clusters)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\cluster\util.py in cluster(self, vectors, assign_clusters, trace)
60
61 # call abstract method to cluster the vectors
---> 62 self.cluster_vectorspace(vectors, trace)
63
64 # assign the vectors to clusters
~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\cluster\kmeans.py in cluster_vectorspace(self, vectors, trace)
99 # effect the distance comparison)
100 for means in meanss:
--> 101 means.sort(key=sum)
102
103 # find the set of means that's minimally different from the others
TypeError: 'float' object is not iterable ```
I also tried the following code and get error there as well:
```
kmeans = cluster.KMeans(n_clusters=NUM_CLUSTERS)
kmeans.fit(Y)
```
the error in this case is:
```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-48-229383cd99be> in <module>
1 kmeans = cluster.KMeans(n_clusters=NUM_CLUSTERS)
----> 2 kmeans.fit(Y)
3
4 labels = kmeans.labels_
5 centroids = kmeans.cluster_centers_
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in fit(self, X, y, sample_weight)
969 tol=self.tol, random_state=random_state, copy_x=self.copy_x,
970 n_jobs=self.n_jobs, algorithm=self.algorithm,
--> 971 return_n_iter=True)
972 return self
973
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in k_means(X, n_clusters, sample_weight, init, precompute_distances, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, algorithm, return_n_iter)
309 order = "C" if copy_x else None
310 X = check_array(X, accept_sparse='csr', dtype=[np.float64, np.float32],
--> 311 order=order, copy=copy_x)
312 # verify that the number of samples given is larger than k
313 if _num_samples(X) < n_clusters:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
525 try:
526 warnings.simplefilter('error', ComplexWarning)
--> 527 array = np.asarray(array, dtype=dtype, order=order)
528 except ComplexWarning:
529 raise ValueError("Complex data not supported\n"
~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
536
537 """
--> 538 return array(a, dtype, copy=False, order=order)
539
540
ValueError: setting an array element with a sequence.
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
在里面
1个群集=3个
2 kclusterer=KMeansClusterer(NUM_CLUSTERS,distance=nltk.cluster.util.cosine_distance,repeats=25,avoid_empty_CLUSTERS=True)
---->3个已分配的集群=kclusterer.cluster(df_Y,assign_clusters=True)
4个打印(分配的_集群)
群集中的~\AppData\Local\Continuum\anaconda3\lib\site packages\nltk\cluster\util.py(self、vectors、assign\u clusters、trace)
60
61#调用抽象方法对向量进行聚类
--->62自簇向量空间(向量,轨迹)
63
64#将向量分配给簇
群集向量空间中的~\AppData\Local\Continuum\anaconda3\lib\site packages\nltk\cluster\kmeans.py(self、vectors、trace)
99(影响距离比较)
100表示平均数:
-->101表示排序(键=和)
102
103#找到与其他方法差异最小的方法集
TypeError:“float”对象不可编辑```
我还尝试了以下代码,并在那里得到了错误:
```
kmeans=cluster.kmeans(n_clusters=NUM_clusters)
kmeans.fit(Y)
```
这种情况下的错误是:
```
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在里面
1 kmeans=cluster.kmeans(n_clusters=NUM_clusters)
---->2公里平均值。安装(Y)
3.
4标签=kmeans.labels_
5质心=kmeans.cluster\u中心_
~\AppData\Local\Continuum\anaconda3\lib\site packages\sklearn\cluster\k\u表示适合(自身、X、y、样本重量)
969 tol=self.tol,random_state=random_state,copy_x=self.copy_x,
970 n_作业=self.n_作业,算法=self.algorithm,
-->971返回值(iter=True)
972回归自我
973
~\AppData\Local\Continuum\anaconda3\lib\site packages\sklearn\cluster\k_means\k_means.py in k_means(X,n_clusters,sample_weight,init,precompute_distance,n_init,max_iter,verbose,tol,random_state,copy_X,n_jobs,algorithm,return_n_iter)
309 order=“C”如果复制,否则无
310 X=检查数组(X,接受稀疏=csr',dtype=[np.float64,np.float32],
-->311订单=订单,副本=副本x)
312#验证给定的样本数是否大于k
313如果_num_样本(X)527数组=np.asarray(数组,dtype=dtype,order=order)
528除复杂警告外:
529提升值错误(“不支持复杂数据\n”
asarray中的~\AppData\Local\Continuum\anaconda3\lib\site packages\numpy\core\numeric.py(a,数据类型,顺序)
536
537 """
-->538返回数组(a,数据类型,copy=False,order=order)
539
540
ValueError:使用序列设置数组元素。
你能发布错误的回溯吗检查发送的类型
。并使用编辑选项将完整的回溯添加到问题中。类型(发送)是str,我已为两个代码添加了回溯