Python 找到每个簇的平均值,并在数据帧中分配最佳簇
我想在数据框下为列X3聚类,然后为每个聚类找到X3的平均值,然后为最高平均值分配3,为较低平均值分配2,为最低平均值分配1。数据帧下方Python 找到每个簇的平均值,并在数据帧中分配最佳簇,python,pandas,rank,Python,Pandas,Rank,我想在数据框下为列X3聚类,然后为每个聚类找到X3的平均值,然后为最高平均值分配3,为较低平均值分配2,为最低平均值分配1。数据帧下方 df=pd.DataFrame({'Month':[1,1,1,1,1,1,3,3,3,3,3,3,3],'X1': [10,15,24,32,8,6,10,23,24,56,45,10,56],'X2':[12,90,20,40,10,15,30,40,60,42,2,4,10],'X3': [34,65,34,87,100,65,78,67,34,
df=pd.DataFrame({'Month':[1,1,1,1,1,1,3,3,3,3,3,3,3],'X1':
[10,15,24,32,8,6,10,23,24,56,45,10,56],'X2':[12,90,20,40,10,15,30,40,60,42,2,4,10],'X3':
[34,65,34,87,100,65,78,67,34,98,96,46,76]})
我根据下面的X3列进行了聚类
def cluster(X, n_clusters):
k_means = KMeans(n_clusters=n_clusters).fit(X.values.reshape(-1, 1))
return k_means.labels_
cols = pd.Index(["X3"])
df[cols + "_cluster_id"] = df.groupby("Month")[cols].transform(cluster, n_clusters=3)
现在找出每个集群和月份的X3平均值,然后对其进行排序,并将3分配给最大平均值,2分配给中等平均值,1分配给最低平均值。下面是我所做的,但它不起作用。我怎样才能解决这个问题?多谢各位
mapping = {1: 'weak', 2: 'average', 3: 'good'}
cols=df.columns[3]
df['product_rank'] = df.groupby(['Month','X3_cluster_id'])
[cols].transform('mean').rank(method='dense').astype(int)
df['product_category'] = df['product_rank'].map(mapping)
分配等级时,请确保根据月份对其进行分组 完整代码:
df=pd.DataFrame({'Month':[1,1,1,1,1,1,3,3,3,3,3,3,3],'X1':[10,15,24,32,8,6,10,23,24,56,45,10,56],'X2':[12,90,20,40,10,15,30,40,60,42,2,4,10],'X3':[34,65,34,87,100,65,78,67,34,98,96,46,76]})
def cluster(X, n_clusters):
k_means = KMeans(n_clusters=n_clusters).fit(X.values.reshape(-1, 1))
return k_means.labels_
cols = pd.Index(["X3"])
df[cols + "_cluster_id"] = df.groupby("Month")[cols].transform(cluster, n_clusters=3)
mapping = {1: 'weak', 2: 'average', 3: 'good'}
df['mean_X3'] = df.groupby(["Month","X3_cluster_id"])["X3"].transform("mean")
df["product_category"] = df.groupby("Month")['mean_X3'].rank(method='dense').astype(int).map(mapping)
print(df)
Month X1 X2 X3 X3_cluster_id mean_X3 product_category
0 1 10 12 34 1 57.80 weak
1 1 15 90 65 2 81.00 good
2 1 24 20 34 1 57.80 weak
3 1 32 40 87 0 66.75 average
4 1 8 10 100 0 66.75 average
5 1 6 15 65 2 81.00 good
6 3 10 30 78 1 57.80 weak
7 3 23 40 67 1 57.80 weak
8 3 24 60 34 0 66.75 average
9 3 56 42 98 2 81.00 good
10 3 45 2 96 2 81.00 good
11 3 10 4 46 0 66.75 average
12 3 56 10 76 1 57.80 weak
当您应用kmeans时,平均值已经计算出来,因此我建议进行1次拟合,并返回每个groupby中的标签、平均值和排名:
def cluster(X, n_clusters):
k_means = KMeans(n_clusters=n_clusters).fit(X)
ranks = np.argsort(k_means.cluster_centers_.ravel())+1
res = pd.DataFrame({'cluster':range(k_means.n_clusters),
'means':k_means.cluster_centers_.ravel(),
'ranks':ranks}).loc[k_means.labels_,:]
res.index = X.index
return res
然后,您只需应用上述函数,一次获得等级和平均值:
mapping = {1: 'weak', 2: 'average', 3: 'good'}
res = df.groupby("Month")[['X3']].apply(cluster, n_clusters=3)
cluster means ranks
0 1 34.000000 3
1 2 65.000000 1
2 1 34.000000 3
3 0 93.500000 2
4 0 93.500000 2
5 2 65.000000 1
6 0 73.666667 2
7 0 73.666667 2
8 1 40.000000 1
9 2 97.000000 3
10 2 97.000000 3
11 1 40.000000 1
12 0 73.666667 2
您可以应用映射
,也可以应用带有左连接的完整数据帧:
res['product_category'] = res['ranks'].map(mapping)
df.merge(res,left_index=True,right_index=True)
Month X1 X2 X3 cluster means ranks product_category
0 1 10 12 34 1 34.000000 1 weak
1 1 15 90 65 0 65.000000 2 average
2 1 24 20 34 1 34.000000 1 weak
3 1 32 40 87 2 93.500000 3 good
4 1 8 10 100 2 93.500000 3 good
5 1 6 15 65 0 65.000000 2 average
6 3 10 30 78 0 73.666667 2 average
7 3 23 40 67 0 73.666667 2 average
8 3 24 60 34 1 40.000000 1 weak
9 3 56 42 98 2 97.000000 3 good
10 3 45 2 96 2 97.000000 3 good
11 3 10 4 46 1 40.000000 1 weak
12 3 56 10 76 0 73.666667 2 average
似乎不对。你应该考虑一个月。计算第1个月和第3个月的平均值。所以这个月应该包括在小组中。谢谢,我明白了,编辑了我的答案:)