Python 准确度、记忆力、F1成绩与sklearn持平
我试图比较不同的距离计算方法和k-最近邻算法中的不同投票系统。目前我的问题是,无论我做什么,scikit学习中的Python 准确度、记忆力、F1成绩与sklearn持平,python,machine-learning,scikit-learn,knn,Python,Machine Learning,Scikit Learn,Knn,我试图比较不同的距离计算方法和k-最近邻算法中的不同投票系统。目前我的问题是,无论我做什么,scikit学习中的precision\u recall\u fscore\u support方法都会产生与precision、recall和fscore完全相同的结果。为什么呢?我在不同的数据集(虹膜、玻璃杯和葡萄酒)上试过。我做错了什么?迄今为止的守则: #!/usr/bin/env python3 from collections import Counter from data_loader im
precision\u recall\u fscore\u support
方法都会产生与precision、recall和fscore完全相同的结果。为什么呢?我在不同的数据集(虹膜、玻璃杯和葡萄酒)上试过。我做错了什么?迄今为止的守则:
#!/usr/bin/env python3
from collections import Counter
from data_loader import DataLoader
from sklearn.metrics import precision_recall_fscore_support as pr
import random
import math
import ipdb
def euclidean_distance(x, y):
return math.sqrt(sum([math.pow((a - b), 2) for a, b in zip(x, y)]))
def manhattan_distance(x, y):
return sum(abs([(a - b) for a, b in zip(x, y)]))
def get_neighbours(training_set, test_instance, k):
names = [instance[4] for instance in training_set]
training_set = [instance[0:4] for instance in training_set]
distances = [euclidean_distance(test_instance, training_set_instance) for training_set_instance in training_set]
distances = list(zip(distances, names))
print(list(filter(lambda x: x[0] == 0.0, distances)))
sorted(distances, key=lambda x: x[0])
return distances[:k]
def plurality_voting(nearest_neighbours):
classes = [nearest_neighbour[1] for nearest_neighbour in nearest_neighbours]
count = Counter(classes)
return count.most_common()[0][0]
def weighted_distance_voting(nearest_neighbours):
distances = [(1/nearest_neighbour[0], nearest_neighbour[1]) for nearest_neighbour in nearest_neighbours]
index = distances.index(min(distances))
return nearest_neighbours[index][1]
def weighted_distance_squared_voting(nearest_neighbours):
distances = list(map(lambda x: 1 / x[0]*x[0], nearest_neighbours))
index = distances.index(min(distances))
return nearest_neighbours[index][1]
def main():
data = DataLoader.load_arff("datasets/iris.arff")
dataset = data["data"]
# random.seed(42)
random.shuffle(dataset)
train = dataset[:100]
test = dataset[100:150]
classes = [instance[4] for instance in test]
predictions = []
for test_instance in test:
prediction = weighted_distance_voting(get_neighbours(train, test_instance[0:4], 15))
predictions.append(prediction)
print(pr(classes, predictions, average="micro"))
if __name__ == "__main__":
main()
问题是你使用的是“微”平均值 如上所述: 如文件中所述:“注意,对于“微”平均值 在多类设置中,将产生相同的精度、召回率和 [图像:F],而“加权”平均可能会产生一个 而不是在精确性和召回率之间。” 但是如果使用labels参数删除多数标签,则 微平均不同于精度,而精度不同于 回忆