PythonPDist：使用序列设置数组元素_Python_Pandas_Bioinformatics

PythonPDist：使用序列设置数组元素

python pandas

PythonPDist：使用序列设置数组元素,python,pandas,bioinformatics,Python,Pandas,Bioinformatics,我已经编写了以下代码 arr_coord = [] for chains in structure: for chain in chains: for residue in chain: for atom in residue: x = atom.get_coord() arr_coord.append({'X': [x[0]

我已经编写了以下代码

arr_coord = []

for chains in structure:
    for chain in chains:
        for residue in chain:                             
            for atom in residue:
                x = atom.get_coord()
                arr_coord.append({'X': [x[0]],'Y':[x[1]],'Z':[x[2]]})                


coord_table = pd.DataFrame(arr_coord)
print(coord_table)

生成以下数据帧

             X         Y         Z
0      [-5.43]  [28.077]  [-0.842]
1     [-3.183]  [26.472]   [1.741]
2     [-2.574]  [22.752]    [1.69]
3     [-1.743]  [21.321]   [5.121]
4      [0.413]  [18.212]   [5.392]
5      [0.714]  [15.803]   [8.332]
6      [4.078]  [15.689]  [10.138]
7      [5.192]    [12.2]   [9.065]
8      [4.088]   [12.79]   [5.475]
9      [5.875]  [16.117]   [4.945]
10     [8.514]  [15.909]    [2.22]
11    [12.235]   [15.85]   [2.943]
12    [13.079]  [16.427]  [-0.719]
13    [10.832]  [19.066]  [-2.324]
14    [12.327]  [22.569]  [-2.163]
15     [8.976]  [24.342]  [-1.742]
16     [7.689]  [25.565]   [1.689]
17     [5.174]  [23.336]   [3.467]
18     [2.339]  [24.135]   [5.889]
19       [0.9]  [22.203]   [8.827]
20    [-1.217]  [22.065]  [11.975]
21     [0.334]  [20.465]   [15.09]
22       [0.0]  [20.066]  [18.885]
23     [2.738]  [21.762]  [20.915]
24     [4.087]  [19.615]  [23.742]
25     [7.186]  [21.618]  [24.704]
26     [8.867]  [24.914]   [23.91]
27    [11.679]  [27.173]  [24.946]
28     [10.76]  [30.763]  [25.731]
29    [11.517]  [33.056]  [22.764]
..         ...       ...       ...
431    [8.093]  [34.654]  [68.474]
432    [7.171]  [32.741]  [65.298]
433    [5.088]  [35.626]  [63.932]
434    [7.859]   [38.22]  [64.329]
435   [10.623]  [35.908]    [63.1]
436   [12.253]  [36.776]  [59.767]
437    [10.65]  [35.048]  [56.795]
438    [7.459]  [34.084]  [58.628]
439    [4.399]  [35.164]  [56.713]
440    [0.694]  [35.273]  [57.347]
441   [-1.906]  [34.388]  [54.667]
442   [-5.139]  [35.863]  [55.987]
443   [-8.663]  [36.808]  [55.097]
444   [-9.629]  [40.233]  [56.493]
445  [-12.886]   [42.15]  [56.888]
446  [-12.969]  [45.937]  [56.576]
447  [-14.759]  [47.638]  [59.485]
448  [-14.836]  [51.367]  [60.099]
449  [-11.607]  [51.863]  [58.176]
450   [-9.836]  [48.934]  [59.829]
451    [-8.95]  [45.445]  [58.689]
452   [-9.824]  [42.599]  [61.073]
453   [-8.559]  [39.047]  [60.598]
454  [-11.201]  [36.341]  [60.195]
455  [-11.561]   [32.71]  [59.077]
456   [-7.786]  [32.216]  [59.387]
457   [-5.785]  [29.886]  [61.675]
458   [-2.143]  [29.222]  [62.469]
459   [-0.946]  [25.828]  [61.248]
460    [2.239]  [25.804]  [63.373]

[461 rows x 3 columns]

我想做的是使用这些X、Y和Z值创建一个欧几里德距离矩阵。我尝试使用pdist函数来实现这一点

dist = pdist(coord_table, metric = 'euclidean')
distance_matrix = squareform(dist)
print(distance_matrix)

但是，解释器给出了以下错误

ValueError: setting an array element with a sequence.

我不知道如何解释此错误或如何修复它。

更改循环

arr_coord = []

for chains in structure:
    for chain in chains:
        for residue in chain:                             
            for atom in residue:
                x = atom.get_coord()
                arr_coord.append({'X': x[0],'Y':x[1],'Z':x[2]}) # here do not need list of list

不是100%确定，但所有坐标都是单列表元素。为什么要以

[x[0]]

的形式构造df而不是

x[0]

？我相信

pdist

会查找数组的元素，但会获取列表。谢谢，这很有效。出于好奇，我以前的代码出了什么问题？函数希望点位于

m X n

数组中，具有

观察值和

维度。在您的版本中，数组中的每个条目都不是一个点，而是一个列表