Python-使用Python中的不同数据类型计算距离

Python-使用Python中的不同数据类型计算距离,python,binary,distance,ordinal,Python,Binary,Distance,Ordinal,我有一个包含11个属性的数据。我想计算每个属性上的距离。例如,它的属性x1,x2,…,x11和x1&x2具有标称类型x3,x4。。。x10具有序数类型,然后x11具有二进制类型。如何使用python读取属性?如何在python中区分这些属性,如何在python中区分这些属性,以便计算距离?谁能告诉我该怎么办?多谢各位 样本数据:x1林业、种植园、其他、林业x2种植园、种植园、灌木、森林x3高、高、中、低x4低、中、高、高x5高、低、中、高x6中、低、高、中x7 3、1、0、4 x8低、低、高、中

我有一个包含11个属性的数据。我想计算每个属性上的距离。例如,它的属性x1,x2,…,x11和x1&x2具有标称类型x3,x4。。。x10具有序数类型,然后x11具有二进制类型。如何使用python读取属性?如何在python中区分这些属性,如何在python中区分这些属性,以便计算距离?谁能告诉我该怎么办?多谢各位


样本数据:x1林业、种植园、其他、林业x2种植园、种植园、灌木、森林x3高、高、中、低x4低、中、高、高x5高、低、中、高x6中、低、高、中x7 3、1、0、4 x8低、低、高、中x9 297、298、297 x10 1、2、0、4 x11 t、t、t、f

您可以这样做:

def distance(x,y):
    p = len(x)
    m = sum(map(lambda (a,b): 1 if a == b else 0, zip(x,y)))
    return float(p-m)/p
例如:

x1 = ("forestry", "plantation", "high", "low", "high", "medium", 3, "low", 297, 1, True)
x2 = ("plantation", "plantation", "high", "medium", "low", "low", 1, "low", 298, 2, True)

print distance(x1,x2) # result: 0.636363636364 = (11-4)/7

我将其改写如下:

首先,我创建一个标称类型工厂:

class BaseNominalType:
    name_values = {}   # <= subclass must override this

    def __init__(self, name):
        self.name = name
        self.value = self.name_values[name]

    def __str__(self):
        return self.name

    def __sub__(self, other):
        assert type(self) == type(other), "Incompatible types, subtraction is undefined"
        return self.value - other.value

# class factory function
def make_nominal_type(name_values):
    try:
        nv = dict(name_values)
    except ValueError:
        nv = {item:i for i,item in enumerate(name_values)}

    # make custom type
    class MyNominalType(BaseNominalType):
        name_values = nv
    return MyNominalType
# base class
class BaseMixedVectorType:
    types = []          # <= subclass must
    distance_fn = None  # <=   override these

    def __init__(self, values):
        self.values = [type_(value) for type_,value in zip(self.types, values)]

    def dist(self, other):
        return self.distance_fn([abs(s - o) for s,o in zip(self.values, other.values)])

# class factory function
def make_mixed_vector_type(types, distance_fn):
    tl = list(types)
    df = distance_fn

    class MyVectorType(BaseMixedVectorType):
        types = tl
        distance_fn = df
    return MyVectorType
然后我创建一个混合向量类型工厂:

class BaseNominalType:
    name_values = {}   # <= subclass must override this

    def __init__(self, name):
        self.name = name
        self.value = self.name_values[name]

    def __str__(self):
        return self.name

    def __sub__(self, other):
        assert type(self) == type(other), "Incompatible types, subtraction is undefined"
        return self.value - other.value

# class factory function
def make_nominal_type(name_values):
    try:
        nv = dict(name_values)
    except ValueError:
        nv = {item:i for i,item in enumerate(name_values)}

    # make custom type
    class MyNominalType(BaseNominalType):
        name_values = nv
    return MyNominalType
# base class
class BaseMixedVectorType:
    types = []          # <= subclass must
    distance_fn = None  # <=   override these

    def __init__(self, values):
        self.values = [type_(value) for type_,value in zip(self.types, values)]

    def dist(self, other):
        return self.distance_fn([abs(s - o) for s,o in zip(self.values, other.values)])

# class factory function
def make_mixed_vector_type(types, distance_fn):
    tl = list(types)
    df = distance_fn

    class MyVectorType(BaseMixedVectorType):
        types = tl
        distance_fn = df
    return MyVectorType
。。。但是等等,我们还没有定义距离函数!我编写这个类是为了让您可以插入任何您喜欢的距离函数,形式如下:

def manhattan_dist(_, vector):
    return sum(vector)

def euclidean_dist(_, vector):
    return sum(v*v for v in vector) ** 0.5

# the distance function per your description:
def fractional_match_distance(_, vector):
    return float(sum(not v for v in vector)) / len(vector)
所以我们完成了创建

# your mixed-vector type
DataItem = make_mixed_vector_type(
    [Forest, Forest, Level, Level, Level, Level, int, Level, int, int, Bool],
    fractional_match_distance
)
并将其测试为

def main():
    raw_data = [
        ('forestry', 'plantation', 'high', 'low', 'high', 'medium', 3, 'low', 297, 1, 't'),
        ('plantation', 'plantation', 'high', 'medium', 'low', 'low', 1, 'low', 298, 2, 't'),
        ('other', 'shrubs', 'medium', 'high', 'medium', 'high', 0, 'high', 299, 0, 't'),
        ('forestry', 'forestry', 'low', 'high', 'high', 'medium', 4, 'medium', 297, 4, 'f')
    ]

    a, b, c, d = [DataItem(d) for d in raw_data]

    print("a to b, dist = {}".format(a.dist(b)))
    print("b to c, dist = {}".format(b.dist(c)))
    print("c to d, dist = {}".format(c.dist(d)))

if __name__=="__main__":
    main()
这给了我们

a to b, dist = 0.363636363636
b to c, dist = 0.0909090909091
c to d, dist = 0.0909090909091

您能提供一些示例数据吗?请将这些值保存为列表。然后您可以调用youList[0]是x1这是一个样本数据:x1林业,种植园,其他,林业x2种植园,种植园,灌木,森林x3高,高,中,低x4低,中,高,高x5高,低,中,高x6中,低,高,中x7 3,1,0,4 x8低,低,高,中x9 297,298,299,297 x1 1,2,0,4 x11 t,t,林业和种植园之间的距离是多少?因为x1和x2是标称类型,我可以用di,j=p-m/p计算距离。p为属性总数,m为i&j处于相同状态的属性。例d1,2=2-0/2=1.谢谢你的帮助。但距离只能用于林业和种植业。值高、低、中、3、297等为序数类型,值真、值假为二进制类型。这就是距离不能用于所有数据。我必须计算有序类型,二进制类型的数据,然后我可以得到混合类型的距离。对于二进制类型true,false,我们可以使用:di,j=r+s/q+r+s+t。r是i为正和j为负的变量总数,s是i为负和j为正的变量总数,q是i为正的变量总数,t是i为负和序数类型的变量总数,我们必须用其秩ex替换每个值。在数据中,低=1,中=2,高=3和0=1,1=2,2=3,3=4,4=5。然后使用zif=rif-1/mf-1标准化排名。zif表示第i个对象的值,数据的rif秩,mf表示低、中、高的总数据。mf=3。最后一个我们可以用欧几里德距离计算数据,你已经在python中尝试过了吗?请在你的问题中张贴你的尝试。不要忘了给你一个清晰详细的描述,你想如何计算两个数据元组之间的总距离,我想没有人会阅读这些评论来获得这些信息。。。。也可以提供两个数据元组的示例以及它们末尾的距离。我也不明白,如何计算两个数据元组之间的距离。。。