Python 熊猫申请返南

Python 熊猫申请返南,python,pandas,Python,Pandas,我有一个json,我正在将其转换为字典,然后使用字典中的某些键值对创建一个数据帧 # json a = """{ "cluster_id": 3, "cluster_observation_data": [[1, 2, 3, 4, 5, 6, 7, 8], [2, 3, 4, 5, 6, 7, 8, 1]], "cluster_observation_label": [0, 1], "cluster_centroid": [1, 2, 3, 4, 5, 6, 7,

我有一个json,我正在将其转换为字典,然后使用字典中的某些键值对创建一个数据帧

# json
a = """{
    "cluster_id": 3,
    "cluster_observation_data": [[1, 2, 3, 4, 5, 6, 7, 8], [2, 3, 4, 5, 6, 7, 8, 1]],
    "cluster_observation_label": [0, 1],
    "cluster_centroid": [1, 2, 3, 4, 5, 6, 7, 10],
    "observation_id":["id_xyz_999","id_abc_000"]
}"""

# convert to dictionary
data = json.loads(a)
sub_dict = dict((k, data[k]) for k in ('cluster_observation_data', 'cluster_observation_label'))
train = pd.DataFrame.from_dict(sub_dict, orient='columns')
将其转换为ddataframe后,我尝试计算其与
数据
字典中存在的
簇心
之间的欧氏距离。该函数工作正常,但在最后的
train
数据帧中,我得到了NaN

def distance_from_center(row):
    centre = data['cluster_centroid']
    obs_data = row[0]
    print('obs_data', obs_data)
    print('\n\n\n\n')
    print('center', centre)
    # print(type(obs_data))
    # print(type(centre))
    dist = sum([(a - b)**2 for a, b in zip(centre, obs_data)])
    print(dist)
    return dist

train.loc[:, 'center_dist'] = train.loc[:, ['cluster_observation_data']].apply(distance_from_center)

我不知道我错在哪里。即使是一个小提示也可以。

您需要通过axis,如:

train.loc[:, 'center_dist'] = train.loc[:, ['cluster_observation_data']].apply(distance_from_center, 1)
原因是您希望将函数分别应用于每个列表。说:

1或“列”:将函数应用于每一行


只需将obs\u dataindistance\u from\u center()中的obs\u data的值从行[0]更改为,因为您在调用该方法时已经使用了该列。那就好了。我试过了,它在我的系统中运行

import json
import pandas as pd
# json
a = """{"cluster_id": 3,"cluster_observation_data": [[1, 2, 3, 4, 5,6, 7, 8], [2, 3, 4,5, 6, 7, 8, 1]],"cluster_observation_label": [0, 1],
"cluster_centroid": [1, 2, 3, 4, 5, 6, 7, 10],
"observation_id":["id_xyz_999","id_abc_000"]}"""

# convert to dictionary
data = json.loads(a)
sub_dict = dict((k, data[k]) for k in ('cluster_observation_data', 
'cluster_observation_label'))
train = pd.DataFrame.from_dict(sub_dict, orient='columns')

def distance_from_center(row):
    centre = data['cluster_centroid']
    obs_data = row
    print('obs_data', obs_data)
    print('\n\n\n\n')
    print('center', centre)
    # print(type(obs_data))
    # print(type(centre))
    dist = sum([(a - b)**2 for a,b in zip(centre, obs_data)])
    print(dist)
    return dist

train.loc[:, 'center_dist'] = train.loc[:,'cluster_observation_data'].apply(distance_from_center)
输出:

obs_data [1, 2, 3, 4, 5, 6, 7, 8]
center [1, 2, 3, 4, 5, 6, 7, 10]
4

obs_data [2, 3, 4, 5, 6, 7, 8, 1]
center [1, 2, 3, 4, 5, 6, 7, 10]
88