Python:如何使用自定义顺序对分组数据集重新编制索引
我已经按照我想要的方式对数据进行了分组,但是月份不符合顺序Python:如何使用自定义顺序对分组数据集重新编制索引,python,pandas,numpy,Python,Pandas,Numpy,我已经按照我想要的方式对数据进行了分组,但是月份不符合顺序 sign_off=df1.groupby(['Sign off','LOB']).sum() print(sign_off) 导致 Test Cases Sign off LOB April2019 Sales 135 April2020 Systems 36
sign_off=df1.groupby(['Sign off','LOB']).sum()
print(sign_off)
导致
Test Cases
Sign off LOB
April2019 Sales 135
April2020 Systems 36
Others 49
August2019 Systems 13
Sales 414
DevOps 47
February2019 Systems 42
February2020 Systems 76
Sales 151
January2019 ECS 251
Systems 157
Sales 116
July2019 Systems 45
Sales 9
June2019 Systems 164
March2019 ECS 37
Systems 181
March2020 Systems 13
May2019 Systems 7
May2020 Systems 249
Others 60
November2019 Systems 49
October2019 Systems 479
Sales 130
这就是我希望它的格式,但月份是按字母顺序排列的(我希望它们保持这种字符串格式)。因此,现在我需要重新安排签准月份,我尝试了以下方法:
order = ['January2019','February2019','March2019','April2019','May2019','June2019','July2019','August2019','October2019','November2019','February2020','March2020','April2020','May2020']
sign_off.reindex(order)
这将导致错误:TypeError:Expected tuple,got str
我需要它根据月份列对数据集进行重新排序。我希望月份的顺序符合我指定的顺序,并具有与该月份相关的正确业务线和测试用例。您可以尝试此操作,在groupby上使用
as_index=False
将列从索引中删除,然后执行其余操作:
sign_off=df1.groupby(['Sign off','LOB'], as_index=False).sum()
df['Sign off'] = pd.to_datetime(df['Sign off'], format='%B%Y', errors='coerce').dt.strftime('%m%Y')
df.sort_values(by=['Sign off'], inplace=True)
df['Sign off'] = pd.to_datetime(df['Sign off']).dt.strftime('%B%Y')
print(df)
输出:
Sign off LOB Test Cases
9 January2019 ECS 251.0
15 March2019 ECS 37.0
17 March2020 Systems 13.0
0 April2019 Sales 135.0
1 April2020 Systems 36.0
18 May2019 Systems 7.0
19 May2020 Systems 249.0
14 June2019 Systems 164.0
12 July2019 Systems 45.0
3 August2019 Systems 13.0
22 October2019 Systems 479.0
您可以尝试这样做,在groupby上使用
as_index=False
从索引中取出列,然后执行其余操作:
sign_off=df1.groupby(['Sign off','LOB'], as_index=False).sum()
df['Sign off'] = pd.to_datetime(df['Sign off'], format='%B%Y', errors='coerce').dt.strftime('%m%Y')
df.sort_values(by=['Sign off'], inplace=True)
df['Sign off'] = pd.to_datetime(df['Sign off']).dt.strftime('%B%Y')
print(df)
输出:
Sign off LOB Test Cases
9 January2019 ECS 251.0
15 March2019 ECS 37.0
17 March2020 Systems 13.0
0 April2019 Sales 135.0
1 April2020 Systems 36.0
18 May2019 Systems 7.0
19 May2020 Systems 249.0
14 June2019 Systems 164.0
12 July2019 Systems 45.0
3 August2019 Systems 13.0
22 October2019 Systems 479.0