Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 找到每个簇的平均值,并在数据帧中分配最佳簇_Python_Pandas_Rank - Fatal编程技术网

Python 找到每个簇的平均值,并在数据帧中分配最佳簇

Python 找到每个簇的平均值,并在数据帧中分配最佳簇,python,pandas,rank,Python,Pandas,Rank,我想在数据框下为列X3聚类,然后为每个聚类找到X3的平均值,然后为最高平均值分配3,为较低平均值分配2,为最低平均值分配1。数据帧下方 df=pd.DataFrame({'Month':[1,1,1,1,1,1,3,3,3,3,3,3,3],'X1': [10,15,24,32,8,6,10,23,24,56,45,10,56],'X2':[12,90,20,40,10,15,30,40,60,42,2,4,10],'X3': [34,65,34,87,100,65,78,67,34,

我想在数据框下为列X3聚类,然后为每个聚类找到X3的平均值,然后为最高平均值分配3,为较低平均值分配2,为最低平均值分配1。数据帧下方

 df=pd.DataFrame({'Month':[1,1,1,1,1,1,3,3,3,3,3,3,3],'X1': 
 [10,15,24,32,8,6,10,23,24,56,45,10,56],'X2':[12,90,20,40,10,15,30,40,60,42,2,4,10],'X3': 
 [34,65,34,87,100,65,78,67,34,98,96,46,76]})
我根据下面的X3列进行了聚类

def cluster(X, n_clusters):
k_means = KMeans(n_clusters=n_clusters).fit(X.values.reshape(-1, 1))
return k_means.labels_

cols = pd.Index(["X3"])
df[cols + "_cluster_id"] = df.groupby("Month")[cols].transform(cluster, n_clusters=3)
现在找出每个集群和月份的X3平均值,然后对其进行排序,并将3分配给最大平均值,2分配给中等平均值,1分配给最低平均值。下面是我所做的,但它不起作用。我怎样才能解决这个问题?多谢各位

mapping = {1: 'weak', 2: 'average', 3: 'good'}
cols=df.columns[3]
df['product_rank'] = df.groupby(['Month','X3_cluster_id']) 
[cols].transform('mean').rank(method='dense').astype(int)
df['product_category'] = df['product_rank'].map(mapping)

分配等级时,请确保根据月份对其进行分组

完整代码:

df=pd.DataFrame({'Month':[1,1,1,1,1,1,3,3,3,3,3,3,3],'X1':[10,15,24,32,8,6,10,23,24,56,45,10,56],'X2':[12,90,20,40,10,15,30,40,60,42,2,4,10],'X3':[34,65,34,87,100,65,78,67,34,98,96,46,76]})
def cluster(X, n_clusters):
    k_means = KMeans(n_clusters=n_clusters).fit(X.values.reshape(-1, 1))
    return k_means.labels_

cols = pd.Index(["X3"])
df[cols + "_cluster_id"] = df.groupby("Month")[cols].transform(cluster, n_clusters=3)
mapping = {1: 'weak', 2: 'average', 3: 'good'}
df['mean_X3'] = df.groupby(["Month","X3_cluster_id"])["X3"].transform("mean")
df["product_category"] = df.groupby("Month")['mean_X3'].rank(method='dense').astype(int).map(mapping)
print(df)

    Month  X1  X2   X3  X3_cluster_id  mean_X3 product_category
0       1  10  12   34              1    57.80             weak
1       1  15  90   65              2    81.00             good
2       1  24  20   34              1    57.80             weak
3       1  32  40   87              0    66.75          average
4       1   8  10  100              0    66.75          average
5       1   6  15   65              2    81.00             good
6       3  10  30   78              1    57.80             weak
7       3  23  40   67              1    57.80             weak
8       3  24  60   34              0    66.75          average
9       3  56  42   98              2    81.00             good
10      3  45   2   96              2    81.00             good
11      3  10   4   46              0    66.75          average
12      3  56  10   76              1    57.80             weak

当您应用kmeans时,平均值已经计算出来,因此我建议进行1次拟合,并返回每个groupby中的标签、平均值和排名:

def cluster(X, n_clusters):
    k_means = KMeans(n_clusters=n_clusters).fit(X)
    ranks = np.argsort(k_means.cluster_centers_.ravel())+1
    res = pd.DataFrame({'cluster':range(k_means.n_clusters),
                  'means':k_means.cluster_centers_.ravel(),
                  'ranks':ranks}).loc[k_means.labels_,:]
    res.index = X.index
    return res
然后,您只需应用上述函数,一次获得等级和平均值:

mapping = {1: 'weak', 2: 'average', 3: 'good'}
res = df.groupby("Month")[['X3']].apply(cluster, n_clusters=3)

    cluster means   ranks
0   1   34.000000   3
1   2   65.000000   1
2   1   34.000000   3
3   0   93.500000   2
4   0   93.500000   2
5   2   65.000000   1
6   0   73.666667   2
7   0   73.666667   2
8   1   40.000000   1
9   2   97.000000   3
10  2   97.000000   3
11  1   40.000000   1
12  0   73.666667   2
您可以应用
映射
,也可以应用带有左连接的完整数据帧:

res['product_category'] = res['ranks'].map(mapping)
df.merge(res,left_index=True,right_index=True)

    Month   X1  X2  X3  cluster means   ranks   product_category
0   1   10  12  34  1   34.000000   1   weak
1   1   15  90  65  0   65.000000   2   average
2   1   24  20  34  1   34.000000   1   weak
3   1   32  40  87  2   93.500000   3   good
4   1   8   10  100 2   93.500000   3   good
5   1   6   15  65  0   65.000000   2   average
6   3   10  30  78  0   73.666667   2   average
7   3   23  40  67  0   73.666667   2   average
8   3   24  60  34  1   40.000000   1   weak
9   3   56  42  98  2   97.000000   3   good
10  3   45  2   96  2   97.000000   3   good
11  3   10  4   46  1   40.000000   1   weak
12  3   56  10  76  0   73.666667   2   average

似乎不对。你应该考虑一个月。计算第1个月和第3个月的平均值。所以这个月应该包括在小组中。谢谢,我明白了,编辑了我的答案:)