Machine learning 从KNN返回最近邻居列表
我试图使用KNN模型来显示与品牌X最相关的品牌。我已经阅读了我的数据,并将其转换为如下格式:Machine learning 从KNN返回最近邻居列表,machine-learning,scikit-learn,Machine Learning,Scikit Learn,我试图使用KNN模型来显示与品牌X最相关的品牌。我已经阅读了我的数据,并将其转换为如下格式: User1 User2 User3 User4 User5 Brand1 1 0 0 0 1 Brand2 0 0 0 1 1 Brand3 0 0 1 1
User1 User2 User3 User4 User5
Brand1 1 0 0 0 1
Brand2 0 0 0 1 1
Brand3 0 0 1 1 1
Brand4 1 1 1 0 1
Brand5 0 0 0 1 1
然后我定义了我的模型:
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
然后,我使用以下代码列出与随机选择的品牌最近的5个品牌:
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
返回如下示例结果:
User1 User2 User3 User4 User5
Brand1 1 0 0 0 1
Brand2 0 0 0 1 1
Brand3 0 0 1 1 1
Brand4 1 1 1 0 1
Brand5 0 0 0 1 1
我所有的结果都显示了距离为1.0的所有品牌,我在代码中哪里出错了?我已经尝试过增加样本数据的大小,但这并没有改变,这让我觉得这是一个代码错误,而不是一个数据怪癖
编辑:以下是我的代码的更完整示例:
import pandas as pd
df = pd.read_csv('sample.csv')
print(df.head())
df_mini = df[:5000]
df_mini = df_mini.transpose()
df_mini = df_mini.drop('UserID',axis=0)
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
示例数据文件:
请您添加更多代码,以便其他人可以运行您的代码并帮助您调试它好吗?@TimH当然,我已经添加了一个更大的当前代码块。您是否可以添加一些示例数据,以便我们能够运行相同的场景?@TimH将此添加到我问题的底部,并在这里()