Python DBSCAN标签和预测标签的数量为'；不匹配_Python_Machine Learning_Unsupervised Learning_Dbscan

Python DBSCAN标签和预测标签的数量为'；不匹配

python machine-learning

Python DBSCAN标签和预测标签的数量为'；不匹配,python,machine-learning,unsupervised-learning,dbscan,Python,Machine Learning,Unsupervised Learning,Dbscan,我想将数据集分为欺诈和非欺诈两部分。为此，我使用了DBSCAN，但我收到了以下错误。“labels_true和labels_pred必须具有相同的大小，分别为7200和28789 " 如果你能帮助我，我将非常高兴。下面的行是为csv读取而写的 import pandas as pd import datetime df=pd.read_csv('C:\\Users\\canberk.cinar\\Desktop\\banksim2.csv') labels=df.fraud.values lab

我想将数据集分为欺诈和非欺诈两部分。为此，我使用了DBSCAN，但我收到了以下错误。“labels_true和labels_pred必须具有相同的大小，分别为7200和28789 "

如果你能帮助我，我将非常高兴。下面的行是为csv读取而写的

import pandas as pd
import datetime
df=pd.read_csv('C:\\Users\\canberk.cinar\\Desktop\\banksim2.csv')
labels=df.fraud.values
labels=labels.reshape(-1,)
print(labels)
df.drop(['fraud'],axis=1,inplace=True)
type(df)
df.head()

然后，正如我所说，我得到了以下错误

labels_true and labels_pred must have same size, got 7200 and 28789

打印所有相关的

shape

s。如果使用jupyer，请重新启动内核。增加DBSAN的min_samples参数，直到得到与真实标签相同数量的预测标签。或者，使用分类算法，而不是无监督学习算法。首先感谢您的所有评论。然而不幸的是，尽管我重新启动并增加了min_样本参数的数量，但我仍然收到相同的错误消息。我仍然欢迎您的建议。打印所有相关的

shape

s。如果使用jupyer，请重新启动内核。增加DBSAN的min_samples参数，直到得到与真实标签相同数量的预测标签。或者，使用分类算法，而不是无监督学习算法。首先感谢您的所有评论。然而不幸的是，尽管我重新启动并增加了min_样本参数的数量，但我仍然收到相同的错误消息。我仍然欢迎你的建议。

# Import DBSCAN
from sklearn.cluster import DBSCAN
from sklearn.metrics.cluster import homogeneity_score
from sklearn.metrics.cluster import silhouette_score

print(np.any(np.isinf(X_scaled)))
print(np.any(np.isnan(X_scaled)))

print(type(X_scaled))


data = X_scaled[np.logical_not(np.isnan(X_scaled))]
print(np.any(np.isnan(data)))

data=data.reshape(-1,1)

# Initialize and fit the DBscan model

db = DBSCAN(eps=0.9, min_samples=1, n_jobs=-1).fit(data)

print(len(X_scaled))
print(len(labels))
print(len(pred_labels))
print(labels.shape)
print(pred_labels.shape)

# Obtain the predicted labels and calculate number of clusters

pred_labels = db.labels_
print(db.labels_)
n_clusters = len(set(pred_labels)) - (1 if -1 in labels else 0)

# Print performance metrics for DBscan
print('Estimated number of clusters: %d' % n_clusters)
print("Homogeneity: %0.3f" % homogeneity_score(labels, pred_labels))
print("Silhouette Coefficient: %0.3f" % silhouette_score(data, pred_labels))

labels_true and labels_pred must have same size, got 7200 and 28789