Pandas 为什么熊猫会在本例中更改索引值?
首先,我们创建一个具有多索引的原始数据集-Pandas 为什么熊猫会在本例中更改索引值?,pandas,indexing,Pandas,Indexing,首先,我们创建一个具有多索引的原始数据集- In [166]: import numpy as np; import pandas as pd In [167]: data_raw = pd.DataFrame([ ...: {'frame': 1, 'face': np.NaN, 'lmark': np.NaN, 'x': np.NaN, 'y': np.NaN}, ...: {'frame': 197, 'face': 0, 'lmark': 1, 'x': 96
In [166]: import numpy as np; import pandas as pd
In [167]: data_raw = pd.DataFrame([
...: {'frame': 1, 'face': np.NaN, 'lmark': np.NaN, 'x': np.NaN, 'y': np.NaN},
...: {'frame': 197, 'face': 0, 'lmark': 1, 'x': 969, 'y': 737},
...: {'frame': 197, 'face': 0, 'lmark': 2, 'x': 969, 'y': 740},
...: {'frame': 197, 'face': 0, 'lmark': 3, 'x': 970, 'y': 744},
...: {'frame': 197, 'face': 0, 'lmark': 4, 'x': 972, 'y': 748},
...: {'frame': 197, 'face': 0, 'lmark': 5, 'x': 973, 'y': 752},
...: {'frame': 300, 'face': 0, 'lmark': 1, 'x': 745, 'y': 367},
...: {'frame': 300, 'face': 0, 'lmark': 2, 'x': 753, 'y': 411},
...: {'frame': 300, 'face': 0, 'lmark': 3, 'x': 759, 'y': 455},
...: {'frame': 301, 'face': 0, 'lmark': 1, 'x': 741, 'y': 364},
...: {'frame': 301, 'face': 0, 'lmark': 2, 'x': 746, 'y': 408},
...: {'frame': 301, 'face': 0, 'lmark': 3, 'x': 750, 'y': 452}]).set_index(['frame', 'face', 'lmark'])
接下来,我们计算每个lmark
-
In [168]: ((data_raw - data_raw.mean(level='lmark')).abs()) / data_raw.std(level='lmark')
Out[168]:
x y
frame face lmark
1 NaN NaN NaN NaN
197 0.0 1.0 1.154565 1.154672
2.0 1.154260 1.154665
3.0 1.153946 1.154654
4.0 NaN NaN
5.0 NaN NaN
300 0.0 1.0 0.561956 0.570343
2.0 0.549523 0.569472
3.0 0.540829 0.568384
301 0.0 1.0 0.592609 0.584329
2.0 0.604738 0.585193
3.0 0.613117 0.586270
索引值不会像预期的那样改变。
现在我们过滤掉lmark
>3处的记录-
In [170]: data_filtered = data_raw.loc[(slice(None), slice(None), [np.NaN, slice(3)]),:]
In [171]: data_filtered
Out[171]:
x y
frame face lmark
1 NaN NaN NaN NaN
197 0.0 1.0 969.0 737.0
2.0 969.0 740.0
3.0 970.0 744.0
300 0.0 1.0 745.0 367.0
2.0 753.0 411.0
3.0 759.0 455.0
301 0.0 1.0 741.0 364.0
2.0 746.0 408.0
3.0 750.0 452.0
然后重新计算z分数-
In [172]: ((data_filtered - data_filtered.mean(level='lmark')).abs()) / data_filtered.std(level='lmark')
Out[172]:
x y
frame face lmark
1 NaN 1.0 NaN NaN
197 0.0 1.0 1.154565 1.154672
2.0 1.154260 1.154665
3.0 1.153946 1.154654
300 0.0 1.0 0.561956 0.570343
2.0 0.549523 0.569472
3.0 0.540829 0.568384
301 0.0 1.0 0.592609 0.584329
2.0 0.604738 0.585193
3.0 0.613117 0.586270
为什么第一条记录的lmark
索引的值从NaN
更改为1.0
我觉得这似乎是个错误
解决方案是使用:
谢谢你的快速回复。在准备提交错误报告时,我从克隆并安装了主分支。然后我重复了这些步骤来重现错误。当我执行
data\u filtered=data\u raw.loc[(slice(None),slice(None),[np.NaN,slice(3)],:]
时,我现在收到一个错误TypeError:unhable type:'slice'
。知道这意味着什么吗?@user2309803-很难回答的问题,对于我来说,选择i npandas0.23.1
也不工作,它返回相同的值。通过缺少值进行选择似乎有问题。我通过data\u filtered=data\u raw.groupby(level=0)测试您的解决方案。head(3)
Issue created
data_filtered.index = data_filtered.index.remove_unused_levels()
a = ((data_filtered - data_filtered.mean(level='lmark')).abs()) / data_filtered.std(level='lmark')
print (a)
x y
frame face lmark
1 NaN NaN NaN NaN
197 0.0 1.0 1.154565 1.154672
2.0 1.154260 1.154665
3.0 1.153946 1.154654
300 0.0 1.0 0.561956 0.570343
2.0 0.549523 0.569472
3.0 0.540829 0.568384
301 0.0 1.0 0.592609 0.584329
2.0 0.604738 0.585193
3.0 0.613117 0.586270