Python Groupby Diff-熊猫_Python_Pandas_Numpy_Pandas Groupby

Python Groupby Diff-熊猫

python pandas numpy

Python Groupby Diff-熊猫,python,pandas,numpy,pandas-groupby,Python,Pandas,Numpy,Pandas Groupby,我想找出多索引中的列之间的差异，我有三个维度，家族、日期和客户机，目标是通过多索引中的客户机、日期和家族的行来创建新的列 import pandas as pd import numpy as np data = { 'Family':{ 0: 'Hugo', 1: 'Hugo', 2: 'Hugo', 3: 'Hugo'}, 'Date'

我想找出多索引中的列之间的差异，我有三个维度，家族、日期和客户机，目标是通过多索引中的客户机、日期和家族的行来创建新的列

    import pandas as pd
    import numpy as np

    data = {
        'Family':{
            0: 'Hugo',
            1: 'Hugo', 
            2: 'Hugo', 
            3: 'Hugo'},
        'Date': {
            0: '2021-04-15',
            1: '2021-04-16',
            2: '2021-04-15',
            3: '2021-04-16'},
        'Client': {
            0: 1,
            1: 1,
            2: 2,
            3: 2},
        'Code_Client': {
            0: 605478.0,
            1: 605478.0,
            2: 605478.0,
            3: 605478.0},
        'Price': {
            0: 2.23354416539888,
            1: 2.0872536032616744,
            2: 1.8426286431701764,
            3: 0.3225935619590472}
        }

    df = pd.DataFrame(data)
    pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns= 
    ['Family','Date', 'Client'])

你知道吗

谢谢，

我假定您正在查找按

系列

和

日期

和

客户

分组的价格差异。您对问题的表述有些不清楚，并且没有发布预期的输出。我稍微更改了您的数据框，添加了一个族，以使解决方案更加可见

data = {
        'Family':{
            0: 'Hugo',
            1: 'Hugo', 
            2: 'Victor', 
            3: 'Victor'},
        'Date': {
            0: '2021-04-15',
            1: '2021-04-16',
            2: '2021-04-15',
            3: '2021-04-16'},
        'Client': {
            0: 1,
            1: 1,
            2: 2,
            3: 2},
        'Code_Client': {
            0: 605478.0,
            1: 605478.0,
            2: 605478.0,
            3: 605478.0},
        'Price': {
            0: 2.23354416539888,
            1: 2.0872536032616744,
            2: 1.8426286431701764,
            3: 0.3225935619590472}
        }

    df = pd.DataFrame(data)
    pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns= 
    ['Family','Date', 'Client'])

如你所见，我加入了维克多家族。因此，您的dataframe如下所示：

Family        Date  Client  Code_Client     Price
0    Hugo  2021-04-15       1     605478.0  2.233544
1    Hugo  2021-04-16       1     605478.0  2.087254
2  Victor  2021-04-15       2     605478.0  1.842629
3  Victor  2021-04-16       2     605478.0  0.322594

要按组添加差异栏，我建议您执行以下操作：

df =  df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice

for ix in df.index.levels[0]:
    df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff()

第一步为您的变量（您想要分组的变量）编制索引，并创建一个空的（或用

nan

填充）差异列。第二步通过行和组之间的差异来填充它

这将返回：

                       Price      diff
Family Date       Client                    
Hugo   2021-04-15 1       2.233544       NaN
       2021-04-16 1       2.087254 -0.146291
Victor 2021-04-15 2       1.842629       NaN
       2021-04-16 2       0.322594 -1.520035

如果您对

nan

不满意，请执行以下操作：

df =  df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice

for ix in df.index.levels[0]:
    df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff().fillna(0)

我在

diff（）

语句中添加了

.fillna（0）

。它返回：

                     Price      diff
Family Date       Client                    
Hugo   2021-04-15 1       2.233544  0.000000
       2021-04-16 1       2.087254 -0.146291
Victor 2021-04-15 2       1.842629  0.000000
       2021-04-16 2       0.322594 -1.520035

你能编辑你的问题并把预期结果放在那里吗？你好，谢尔盖，谢谢你的回答！