Python float（）的文本无效：_Python_Knn

Python float（）的文本无效：

python

Python float（）的文本无效：,python,knn,Python,Knn,我是python新手。所以，也许这里有一些基本的东西我遗漏了，但我不能弄清楚…对于我的工作，我试图读取一个txt文件并应用KNN 文件内容如下，它有三列，第三列是类，分隔符是空格 0.85 17.45 2 0.75 15.6 2 3.3 15.45 2 5.25 14.2 2 4.9 15.65 2 5.35 15.85 2 5.1 17.9 2 4.6 18.25 2 4.05 18.75

我是python新手。所以，也许这里有一些基本的东西我遗漏了，但我不能弄清楚…对于我的工作，我试图读取一个txt文件并应用KNN

文件内容如下，它有三列，第三列是类，分隔符是空格

0.85    17.45   2
0.75    15.6    2
3.3     15.45   2
5.25    14.2    2
4.9     15.65   2
5.35    15.85   2
5.1     17.9    2
4.6     18.25   2
4.05    18.75   2
3.4     19.7    2
2.9     21.15   2
3.1     21.85   2
3.9     21.85   2
4.4     20.05   2
7.2     14.5    2
7.65    16.5    2
7.1     18.65   2
7.05    19.9    2
5.85    20.55   2
5.5     21.8    2
6.55    21.8    2
6.05    22.3    2
5.2     23.4    2
4.55    23.9    2
5.1     24.4    2
8.1     26.35   2
10.15   27.7    2
9.75    25.5    2
9.2     21.1    2
11.2    22.8    2
12.6    23.1    2
13.25   23.5    2
11.65   26.85   2
12.45   27.55   2
13.3    27.85   2
13.7    27.75   2
14.15   26.9    2
14.05   26.55   2
15.15   24.2    2
15.2    24.75   2
12.2    20.9    2
12.15   21.45   2
12.75   22.05   2
13.15   21.85   2
13.75   22  2
13.95   22.7    2
14.4    22.65   2
14.2    22.15   2
14.1    21.75   2
14.05   21.4    2
17.2    24.8    2
17.7    24.85   2
17.55   25.2    2
17      26.85   2
16.55   27.1    2
19.15   25.35   2
18.8    24.7    2
21.4    25.85   2
15.8    21.35   2
16.6    21.15   2
17.45   20.75   2
18      20.95   2
18.25   20.2    2
18      22.3    2
18.6    22.25   2
19.2    21.95   2
19.45   22.1    2
20.1    21.6    2
20.1    20.9    2
19.9    20.35   2
19.45   19.05   2
19.25   18.7    2
21.3    22.3    2
22.9    23.65   2
23.15   24.1    2
24.25   22.85   2
22.05   20.25   2
20.95   18.25   2
21.65   17.25   2
21.55   16.7    2
21.6    16.3    2
21.5    15.5    2
22.4    16.5    2
22.25   18.1    2
23.15   19.05   2
23.5    19.8    2
23.75   20.2    2
25.15   19.8    2
25.5    19.45   2
23      18      2
23.95   17.75   2
25.9    17.55   2
27.65   15.65   2
23.1    14.6    2
23.5    15.2    2
24.05   14.9    2
24.5    14.7    2
14.15   17.35   1
14.3    16.8    1
14.3    15.75   1
14.75   15.1    1
15.35   15.5    1
15.95   16.45   1
16.5    17.05   1
17.35   17.05   1
17.15   16.3    1
16.65   16.1    1
16.5    15.15   1
16.25   14.95   1
16      14.25   1
15.9    13.2    1
15.15   12.05   1
15.2    11.7    1
17      15.65   1
16.9    15.35   1
17.35   15.45   1
17.15   15.1    1
17.3    14.9    1
17.7    15      1
17      14.6    1
16.85   14.3    1
16.6    14.05   1
17.1    14      1
17.45   14.15   1
17.8    14.2    1
17.6    13.85   1
17.2    13.5    1
17.25   13.15   1
17.1    12.75   1
16.95   12.35   1
16.5    12.2    1
16.25   12.5    1
16.05   11.9    1
16.65   10.9    1
16.7    11.4    1
16.95   11.25   1
17.3    11.2    1
18.05   11.9    1
18.6    12.5    1
18.9    12.05   1
18.7    11.25   1
17.95   10.9    1
18.4    10.05   1
17.45   10.4    1
17.6    10.15   1
17.7    9.85    1
17.3    9.7     1
16.95   9.7     1
16.75   9.65    1
19.8    9.95    1
19.1    9.55    1
17.5    8.3     1
17.55   8.1     1
17.85   7.55    1
18.2    8.35    1
19.3    9.1     1
19.4    8.85    1
19.05   8.85    1
18.9    8.5     1
18.6    7.85    1
18.7    7.65    1
19.35   8.2     1
19.95   8.3     1
20      8.9     1
20.3    8.9     1
20.55   8.8     1
18.35   6.95    1
18.65   6.9     1
19.3    7       1
19.1    6.85    1
19.15   6.65    1
21.2    8.8     1
21.4    8.8     1
21.1    8       1
20.4    7       1
20.5    6.35    1
20.1    6.05    1
20.45   5.15    1
20.95   5.55    1
20.95   6.2     1
20.9    6.6     1
21.05   7       1
21.85   8.5     1
21.9    8.2     1
22.3    7.7     1
21.85   6.65    1
21.3    5.05    1
22.6    6.7     1
22.5    6.15    1
23.65   7.2     1
24.1    7       1
21.95   4.8     1
22.15   5.05    1
22.45   5.3     1
22.45   4.9     1
22.7    5.5     1
23      5.6     1
23.2    5.3     1
23.45   5.95    1
23.75   5.95    1
24.45   6.15    1
24.6    6.45    1
25.2    6.55    1
26.05   6.4     1
25.3    5.75    1
24.35   5.35    1
23.3    4.9     1
22.95   4.75    1
22.4    4.55    1
22.8    4.1     1
22.9    4       1
23.25   3.85    1
23.45   3.6     1
23.55   4.2     1
23.8    3.65    1
23.8    4.75    1
24.2    4       1
24.55   4       1
24.7    3.85    1
24.7    4.3     1
24.9    4.75    1
26.4    5.7     1
27.15   5.95    1
27.3    5.45    1
27.5    5.45    1
27.55   5.1     1
26.85   4.95    1
26.6    4.9     1
26.85   4.4     1
26.2    4.4     1
26      4.25    1
25.15   4.1     1
25.6    3.9     1
25.85   3.6     1
24.95   3.35    1
25.1    3.25    1
25.45   3.15    1
26.85   2.95    1
27.15   3.15    1
27.2    3       1
27.95   3.25    1
27.95   3.5     1
28.8    4.05    1
28.8    4.7     1
28.75   5.45    1
28.6    5.75    1
29.25   6.3     1
30      6.55    1
30.6    3.4     1
30.05   3.45    1
29.75   3.45    1
29.2    4       1
29.45   4.05    1
29.05   4.55    1
29.4    4.85    1
29.5    4.7     1
29.9    4.45    1
30.75   4.45    1
30.4    4.05    1
30.8    3.95    1
31.05   3.95    1
30.9    5.2     1
30.65   5.85    1
30.7    6.15    1
31.5    6.25    1
31.65   6.55    1
32      7       1
32.5    7.95    1
33.35   7.45    1
32.6    6.95    1
32.65   6.6     1
32.55   6.35    1
32.35   6.1     1
32.55   5.8     1
32.2    5.05    1
32.35   4.25    1
32.9    4.15    1
32.7    4.6     1
32.75   4.85    1
34.1    4.6     1
34.1    5       1
33.6    5.25    1
33.35   5.65    1
33.75   5.95    1
33.4    6.2     1
34.45   5.8     1
34.65   5.65    1
34.65   6.25    1
35.25   6.25    1
34.35   6.8     1
34.1    7.15    1
34.45   7.3     1
34.7    7.2     1
34.85   7       1
34.35   7.75    1
34.55   7.85    1
35.05   8       1
35.5    8.05    1
35.8    7.1     1
36.6    6.7     1
36.75   7.25    1
36.5    7.4     1
35.95   7.9     1
36.1    8.1     1
36.15   8.4     1
37.6    7.35    1
37.9    7.65    1
29.15   4.4     1
34.9    9       1
35.3    9.4     1
35.9    9.35    1
36      9.65    1
35.75   10      1
36.7    9.15    1
36.6    9.8     1
36.9    9.75    1
37.25   10.15   1
36.4    10.15   1
36.3    10.7    1
36.75   10.85   1
38.15   9.7     1
38.4    9.45    1
38.35   10.5    1
37.7    10.8    1
37.45   11.15   1
37.35   11.4    1
37      11.75   1
36.8    12.2    1
37.15   12.55   1
37.25   12.15   1
37.65   11.95   1
37.95   11.85   1
38.6    11.75   1
38.5    12.2    1
38      12.95   1
37.3    13      1
37.5    13.4    1
37.85   14.5    1
38.3    14.6    1
38.05   14.45   1
38.35   14.35   1
38.5    14.25   1
39.3    14.2    1
39      13.2    1
38.95   12.9    1
39.2    12.35   1
39.5    11.8    1
39.55   12.3    1
39.75   12.75   1
40.2    12.8    1
40.4    12.05   1
40.45   12.5    1
40.55   13.15   1
40.45   14.5    1
40.2    14.8    1
40.65   14.9    1
40.6    15.25   1
41.3    15.3    1
40.95   15.7    1
41.25   16.8    1
40.95   17.05   1
40.7    16.45   1
40.45   16.3    1
39.9    16.2    1
39.65   16.2    1
39.25   15.5    1
38.85   15.5    1
38.3    16.5    1
38.75   16.85   1
39      16.6    1
38.25   17.35   1
39.5    16.95   1
39.9    17.05   1

我的代码：

import csv
import random
import math
import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[]):
    with open(filename, 'rb') as csvfile:
        lines = csv.reader(csvfile)
        dataset = list(lines)
        for x in range(len(dataset)-1):
            for y in range(3):
                dataset[x][y] = float(dataset[x][y])
            if random.random() < split:
                trainingSet.append(dataset[x])
            else:
                testSet.append(dataset[x])


def euclideanDistance(instance1, instance2, length):
    distance = 0
    for x in range(length):
        distance += pow((instance1[x] - instance2[x]), 2)
    return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):
    distances = []
    length = len(testInstance)-1
    for x in range(len(trainingSet)):
        dist = euclideanDistance(testInstance, trainingSet[x], length)
        distances.append((trainingSet[x], dist))
    distances.sort(key=operator.itemgetter(1))
    neighbors = []
    for x in range(k):
        neighbors.append(distances[x][0])
    return neighbors

def getResponse(neighbors):
    classVotes = {}
    for x in range(len(neighbors)):
        response = neighbors[x][-1]
        if response in classVotes:
            classVotes[response] += 1
        else:
            classVotes[response] = 1
    sortedVotes = sorted(classVotes.iteritems(), key=operator.itemgetter(1), reverse=True)
    return sortedVotes[0][0]

def getAccuracy(testSet, predictions):
    correct = 0
    for x in range(len(testSet)):
        if testSet[x][-1] == predictions[x]:
            correct += 1
    return (correct/float(len(testSet))) * 100.0

def main():
    # prepare data
    trainingSet=[]
    testSet=[]
    split = 0.67
    loadDataset('Jain.txt', split, trainingSet, testSet)
    print 'Train set: ' + repr(len(trainingSet))
    print 'Test set: ' + repr(len(testSet))
    # generate predictions
    predictions=[]
    k = 3
    for x in range(len(testSet)):
        neighbors = getNeighbors(trainingSet, testSet[x], k)
        result = getResponse(neighbors)
        predictions.append(result)
        print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-1]))
    accuracy = getAccuracy(testSet, predictions)
    print('Accuracy: ' + repr(accuracy) + '%')
main()

导入csv
随机输入
输入数学
进口经营者
def loadDataset（文件名、拆分、训练集=[]，测试集=[]）：
将open（filename，'rb'）作为csvfile：
行=csv.reader（csvfile）
数据集=列表（行）
对于范围内的x（len（数据集）-1）：
对于范围（3）内的y：
数据集[x][y]=浮点（数据集[x][y]）
如果为random.random（）<拆分：
trainingSet.append（数据集[x]）
其他：
追加（数据集[x]）
def欧几里德距离（实例1、实例2、长度）：
距离=0
对于范围内的x（长度）：
距离+=pow（（实例1[x]-实例2[x]），2）
返回math.sqrt（距离）
def GetNeights（培训集、测试集、k）：
距离=[]
长度=长度（测试）-1
对于范围内的x（透镜（训练集））：
距离=欧几里德距离（测试距离，训练集[x]，长度）
距离。附加（（训练集[x]，距离））
距离.排序（key=operator.itemgetter（1））
邻居=[]
对于范围（k）内的x：
append（距离[x][0]）
回乡
def getResponse（邻居）：
类投票={}
对于范围内的x（len（邻居））：
响应=邻居[x][1]
如果在课堂投票中有回应：
类投票[响应]+=1
其他：
类投票[响应]=1
sortedVotes=sorted（classVotes.iteritems（），key=operator.itemgetter（1），reverse=True）
返回已分类的文件[0][0]
def GetAccurance（测试集、预测）：
正确=0
对于范围内的x（len（testSet））：
如果测试集[x][1]==预测[x]：
正确+=1
返回（正确/浮动（len（testSet）））*100.0
def main（）：
#准备数据
培训集=[]
测试集=[]
拆分=0.67
loadDataset（'Jain.txt'，拆分，训练集，测试集）
打印“列车组：”+报告（列（列车组））
打印“测试集：”+repr（len（测试集））
#生成预测
预测=[]
k=3
对于范围内的x（len（testSet））：
邻居=获取邻居（训练集，测试集[x]，k）
结果=getResponse（邻居）
预测。追加（结果）
打印（'>predicted='+repr（result）+'，actual='+repr（testSet[x][-1]））
精度=获取精度（测试集、预测）
打印（'精度：'+repr（精度）+'%'）
main（）

此处：

lines = csv.reader(csvfile)

您必须-否则它将使用默认的excel“，”分隔符。请注意，在您发布的示例中，分隔符实际上可能不是“空格”，而是一个选项卡（

“\t”

，在python中）或只是随机数目的空格-在这种情况下，它不是类似csv的格式，您必须自己解析行

此外，您的代码远不是pythonic。首先：python的“for”循环实际上是“for-each”类型的循环，即它们直接从您迭代的对象生成值。迭代列表的正确方法是：

lst = ["a", "b", "c"]
for item in lst:
    print(item)

因此这里不需要

range（）

和索引访问。请注意，如果您也想要索引，可以使用

enumerate（sequence）

，这将产生

（索引，项）

对，即：

lst = ["a", "b", "c"]
for index, item in enumerate(lst):
    print("item at {} is {}".format(index, item))

因此，可以将loadDataset（）函数重写为：

def loadDataset(filename, split, trainingSet=None , testSet=None):
    # fix the mutable default argument gotcha
    # cf https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
    if trainingSet is None:
        trainingSet = []
    if testSet is None:
        testSet = []

with open(filename, 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter="\t")

    for row in reader:
        row = tuple(float(x) for x in row)
        if random.random() < split:
            trainingSet.append(row)
        else:
            testSet.append(row)

# so the caller can get the values back
return trainingSet, testSet

def euclideanDistance(instance1, instance2, length):
    pairs = zip(instance1[:length], instance2[:length])
    return math.sqrt(sum(pow(x - y) for x, y in pairs))

另外，如果要并行迭代两个列表（获取'list1[x'，list2[x]'对），请执行以下操作：

还有一些函数可以

sum（）
lst=[1,2,3]
打印（总和（第一次））
因此，您的euclideanDistance
函数可以重写为：
def loadDataset(filename, split, trainingSet=None , testSet=None):
    # fix the mutable default argument gotcha
    # cf https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
    if trainingSet is None:
        trainingSet = []
    if testSet is None:
        testSet = []

with open(filename, 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter="\t")

    for row in reader:
        row = tuple(float(x) for x in row)
        if random.random() < split:
            trainingSet.append(row)
        else:
            testSet.append(row)

# so the caller can get the values back
return trainingSet, testSet

def euclideanDistance(instance1, instance2, length):
    pairs = zip(instance1[:length], instance2[:length])
    return math.sqrt(sum(pow(x - y) for x, y in pairs))

等等。
看起来您正试图将其作为CSV（逗号分隔值）读取。似乎您的数据更适合作为文本文件读取。@Stephencouley csv模块支持与任何分隔符一起使用。我在这行中仍然有一个错误row=tuple（float[x]表示行中的x）TypeError:“type”对象没有属性“getitem”我的错误！当然应该是float（x）
（答案中的拼写错误已纠正）哦，是的：如果文件中有任何数据不是float的正确表示形式，则可能仍然存在ValueError
——在这种情况下，您必须以某种方式处理该问题。我编辑了我的帖子，添加了一个错误处理的例子。是的，有一个错误，前七行是字符串，作为对文件的介绍，所以我跳过了前七行