Machine learning 从KNN返回最近邻居列表

Machine learning 从KNN返回最近邻居列表,machine-learning,scikit-learn,Machine Learning,Scikit Learn,我试图使用KNN模型来显示与品牌X最相关的品牌。我已经阅读了我的数据,并将其转换为如下格式: User1 User2 User3 User4 User5 Brand1 1 0 0 0 1 Brand2 0 0 0 1 1 Brand3 0 0 1 1

我试图使用KNN模型来显示与品牌X最相关的品牌。我已经阅读了我的数据,并将其转换为如下格式:

          User1     User2     User3     User4     User5
Brand1    1         0         0         0         1
Brand2    0         0         0         1         1
Brand3    0         0         1         1         1
Brand4    1         1         1         0         1
Brand5    0         0         0         1         1
然后我定义了我的模型:

from sklearn.neighbors import NearestNeighbors

model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
然后,我使用以下代码列出与随机选择的品牌最近的5个品牌:

query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)

for i in range(0, len(distances.flatten())):
    if i == 0:
        print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
    else:
        print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
返回如下示例结果:

          User1     User2     User3     User4     User5
Brand1    1         0         0         0         1
Brand2    0         0         0         1         1
Brand3    0         0         1         1         1
Brand4    1         1         1         0         1
Brand5    0         0         0         1         1
我所有的结果都显示了距离为1.0的所有品牌,我在代码中哪里出错了?我已经尝试过增加样本数据的大小,但这并没有改变,这让我觉得这是一个代码错误,而不是一个数据怪癖

编辑:以下是我的代码的更完整示例:

import pandas as pd
df = pd.read_csv('sample.csv')
print(df.head())

df_mini = df[:5000]
df_mini = df_mini.transpose()
df_mini = df_mini.drop('UserID',axis=0)

from sklearn.neighbors import NearestNeighbors

model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)

query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)

for i in range(0, len(distances.flatten())):
    if i == 0:
        print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
    else:
        print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
示例数据文件:

请您添加更多代码,以便其他人可以运行您的代码并帮助您调试它好吗?@TimH当然,我已经添加了一个更大的当前代码块。您是否可以添加一些示例数据,以便我们能够运行相同的场景?@TimH将此添加到我问题的底部,并在这里()