Python 将X、Y值聚类到扇区中,并在pandas、pandas groupby和或scikit中绘制
我有一个如下所示的数据框Python 将X、Y值聚类到扇区中,并在pandas、pandas groupby和或scikit中绘制,python,pandas,scikit-learn,pandas-groupby,Python,Pandas,Scikit Learn,Pandas Groupby,我有一个如下所示的数据框 X Y Sector Plot 5 3 SE1 P2 3 3 SE1 P1 6 7 SE1 P3 1 6 SE1 P3 2 1 SE1 P1 7 3 SE1 P2 17 20 SE2 P1 23 22 SE2 P1 27 28
X Y Sector Plot
5 3 SE1 P2
3 3 SE1 P1
6 7 SE1 P3
1 6 SE1 P3
2 1 SE1 P1
7 3 SE1 P2
17 20 SE2 P1
23 22 SE2 P1
27 28 SE2 P3
31 25 SE2 P3
25 25 SE2 P2
31 31 SE2 P2
17 25 SE2 P4
23 31 SE2 P4
根据上述数据,我想估算每个扇区、绘图组合的X和Y的最小值和最大值
数据帧的预期输出如下所示
Sector_Plot Xmin Xmax Ymin Ymax
SE1_P1 2 3 1 3
SE1_P2 5 7 3 3
SE1_P3 1 6 6 7
SE2_P1 17 23 20 22
SE2_P2 25 31 25 25
SE2_P3 27 31 25 31
SE2_P4 17 23 25 31
X Y Estimated_Sector_Plot
2.5 2 SE1_P1
2 1 SE1_P1
3 2 SE1_P1
5 3 SE1_P2
7 3 SE1_P2
6 3 SE1_P2
1 7 SE1_P3
4 6 SE1_P3
2 7 SE1_P3
28 25 SE2_P3
29 31 SE2_P3
18 19 SE2_P1
17 20 SE2_P1
19 22 SE2_P1
30 25 SE2_P2
25 25 SE2_P2
18 26 SE2_P4
17 31 SE2_P4
根据上述规则,如果我们得到新的X,Y,我们应该能够预测扇形图,如下所示
Sector_Plot Xmin Xmax Ymin Ymax
SE1_P1 2 3 1 3
SE1_P2 5 7 3 3
SE1_P3 1 6 6 7
SE2_P1 17 23 20 22
SE2_P2 25 31 25 25
SE2_P3 27 31 25 31
SE2_P4 17 23 25 31
X Y Estimated_Sector_Plot
2.5 2 SE1_P1
2 1 SE1_P1
3 2 SE1_P1
5 3 SE1_P2
7 3 SE1_P2
6 3 SE1_P2
1 7 SE1_P3
4 6 SE1_P3
2 7 SE1_P3
28 25 SE2_P3
29 31 SE2_P3
18 19 SE2_P1
17 20 SE2_P1
19 22 SE2_P1
30 25 SE2_P2
25 25 SE2_P2
18 26 SE2_P4
17 31 SE2_P4
我尝试了机器学习方法,但失败了。这可以通过其他方法实现吗
我在下面分享我的代码
def find_frequent_labels(df, var, rare_perc):
df = df.copy()
tmp = df.groupby(var)['X'].count() / len(df)
return tmp[tmp>rare_perc].index
for var in ['SECTOR']:
frequent_ls = find_frequent_labels(train, var, 0.01)
train[var] = np.where(train[var].isin(frequent_ls), train[var], 'Rare')
test[var] = np.where(test[var].isin(frequent_ls), test[var], 'Rare')
def replace_with_X(train1, test1, var, target):
ordered_labels = train1.groupby([var])[target].mean().sort_values().index
ordinal_label = {k:i for i, k in enumerate(ordered_labels, 0)}
train1['Sec_X'] = train1[var].map(ordinal_label)
test1['Sec_X'] = test1[var].map(ordinal_label)
for var in ['SECTOR']:
replace_with_X(train, test, var, 'X')
def replace_with_Y(train1, test1, var, target):
ordered_labels = train1.groupby([var])[target].mean().sort_values().index
ordinal_label = {k:i for i, k in enumerate(ordered_labels, 0)}
train1['Sec_Y'] = train1[var].map(ordinal_label)
test1['Sec_Y'] = test1[var].map(ordinal_label)
for var in ['SECTOR']:
replace_with_Y(train, test, var, 'Y')
train['Plot_id'] = train['PLOT'].factorize()[0]
category_id_df = train[['PLOT', 'Plot_id']].drop_duplicates().sort_values('Plot_id')
category_to_id = dict(category_id_df.values)
id_to_category = dict(category_id_df[['Plot_id', 'PLOT']].values)
category_to_id = dict(category_id_df.values)
from sklearn.svm import LinearSVC
model = LinearSVC(C=1.0, class_weight='balanced')
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test, indices_train, indices_test = train_test_split(train[['X', 'Y', 'Sector_code']], train['Plot_id'], train.index, test_size=0.01, random_state=0)
model.fit(X_train, y_train)
test['Plot_id'] = model.predict(test[['X', 'Y', 'Sector_code']])
请注意,我对机器学习非常陌生,熊猫这类任务可以通过以下方法解决。我们需要的不是最小值和最大值,而是每个扇形图簇的质心(平均x/y坐标)。然后我们得到最近的聚类,其中: 结果:
X Y Estimated_Sector_Plot
0 2.5 2 SE1_P1
1 2.0 1 SE1_P1
2 3.0 2 SE1_P1
3 5.0 3 SE1_P2
4 7.0 3 SE1_P2
5 6.0 3 SE1_P2
6 1.0 7 SE1_P3
7 4.0 6 SE1_P3
8 2.0 7 SE1_P3
9 28.0 25 SE2_P3
10 29.0 31 SE2_P2
11 18.0 19 SE2_P1
12 17.0 20 SE2_P1
13 19.0 22 SE2_P1
14 30.0 25 SE2_P3
15 25.0 25 SE2_P2
16 18.0 26 SE2_P4
17 17.0 31 SE2_P4
df.groupby(['Sector','Plot']).agg(['min','max'])
获取第一个result@user3483203是否可以通过任何其他方法来实现上述目标?如果我从专家那里得到建议,我可以离开这个问题,尝试学习一些东西