Python -：'；的操作数类型不受支持；str'；和'；str'；_Python_Machine Learning_Knn

Python -：'；的操作数类型不受支持；str'；和'；str'；

python machine-learning

Python -：'；的操作数类型不受支持；str'；和'；str'；,python,machine-learning,knn,Python,Machine Learning,Knn,我是数据分析新手，正在寻求帮助。我正在使用python从头开始创建Knn算法。我想我的数据（训练和测试）有问题。我想我必须转换成浮动汇率，但我不是100%确定。我知道我的函数正在工作，因为我用另一个数据集尝试了它们 from scipy.io import arff from io import StringIO import scipy import pandas as pd import numpy as np import math data_train = scipy.io.arff.l

我是数据分析新手，正在寻求帮助。我正在使用python从头开始创建Knn算法。我想我的数据（训练和测试）有问题。我想我必须转换成浮动汇率，但我不是100%确定。我知道我的函数正在工作，因为我用另一个数据集尝试了它们

from scipy.io import arff
from io import StringIO
import scipy
import pandas as pd
import numpy as np
import math
data_train = scipy.io.arff.loadarff('train.arff')
train = pd.DataFrame(data_train[0])
train.head()
data_test = scipy.io.arff.loadarff('test1.arff') 
print(data_test)
test = pd.DataFrame(data_test[0])
test.head()

from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(train, test, test_size = 0.1, random_state=42)
print(X_train, X_test, y_train, y_test)

def distance(testpoint, trainpoint):
    # distance between testpoint and trainpoint.
    dist = np.sqrt(np.sum(np.power(testpoint-trainpoint, 2))) 
    return dis

def getNeighbors(X_train, y_train, X_test, k):
        #For each point in X_test, calculate its distance from itself and each point in X_train
        k_neighbors_with_labels = [] # this will be a list (for each test point) of list (contains the tuple (distance,label) of k nearest neighbors). 
        for testpoint in X_test:
            distances_label = [] # this list carries distances between the testpoint and train point
            for (trainpoint,y_train_label) in zip(X_train,y_train):
                # calculate the distance and append it to a distances_label with the associated label.
                distances_label.append((distance(testpoint, trainpoint), y_train_label))
            k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
        return k_neighbors_with_labels
ne = getNeighbors(X_train, y_train, X_test, k = 3)
print(ne)

TypeError回溯（最近一次调用）
在（）
---->1 ne=GetNeights（X_列，y_列，X_测试，k=3）
2份印刷品（东北）
在GetNeights中（X_列、y_列、X_测试、k）
6对于拉链中的（列车点，y_列车标签）（X_列车，y_列车）：
7#计算距离并将其附加到带有关联标签的距离标签上。
---->8距离\u标签。附加（（距离（测试点、列车点）、y\u列车\u标签））
9 k_近邻_，带_标签+=[已排序（距离_标签）[0:k]]#对距离排序并取前k个近邻
10返回带有标签的k_邻居_
距离（测试点、列车点）
1 def距离（测试点、列车点）：
2#测试点和列车点之间的距离。
---->3 dist=np.sqrt（np.sum（np.power）（测试点训练点，2）））
4回程
TypeError:-：“str”和“str”的操作数类型不受支持

如注释所述-testpoint和trainpoint似乎是字符串。为了确认这一点，您可以添加

print（type（testpoint））

和

print（type（trainpoint））

查看您的代码，以了解它们的实际类型。如果它们确实是字符串（错误表明这一点）；假设它们是存储为Sting的数字，则您可以通过执行以下操作将其转换为int或float：

dist = np.sqrt(np.sum(np.power(float(testpoint)-float(trainpoint), 2)))

根据您的要求，根据需要将int替换为float

有很多方法可以解决这个问题，但最根本的问题是不能像错误指出的那样在字符串上使用-运算符。

错误明确指出

testpoint

和

trainpoint

是

str

，所以你需要在某个地方做一些数据转换。我的数据是一个数组，它包含浮点数加上另一个名为“CLASS_LABEL”的列，这是一个str，我需要将它转换为浮点数？我对你的用例不太熟悉，不知道你到底想做什么，但该代码看起来似乎期望

testpoint

和

trainpoint

都是

float

的单列。

dist = np.sqrt(np.sum(np.power(float(testpoint)-float(trainpoint), 2)))