Python 2.7 pandas-保存顺序时透视表失败
我有以下数据框架,其中一周不是ISO周而是会计周(1是7月的第一周,52是6月的最后一周): 我希望在保持每周顺序的同时透视此表,以获得如下所示的新数据帧,其中值为count,列为域:Python 2.7 pandas-保存顺序时透视表失败,python-2.7,pandas,pivot-table,pandas-groupby,Python 2.7,Pandas,Pivot Table,Pandas Groupby,我有以下数据框架,其中一周不是ISO周而是会计周(1是7月的第一周,52是6月的最后一周): 我希望在保持每周顺序的同时透视此表,以获得如下所示的新数据帧,其中值为count,列为域: > new_df week A B C 43 5 1 NaN 44 NaN 1 NaN 45 1 4 NaN 50 1 11 NaN 51 4 NaN 6 1
> new_df
week A B C
43 5 1 NaN
44 NaN 1 NaN
45 1 4 NaN
50 1 11 NaN
51 4 NaN 6
1 3 NaN 14
2 NaN 3 NaN
3 12 12 NaN
5 NaN NaN 1
我尝试使用groupie和Unstack,但出现以下错误:
> df = df.groupby(['week'], sort=False)['count'].unstack('domain')
AttributeError: Cannot access callable attribute 'unstack' of 'SeriesGroupBy' objects, try using the 'apply' method
您需要
周的自定义订单
s,因此需要使用自定义订单并省略sort=False
:
cats = list(range(26, 52)) + list(range(26))
print (cats)
[26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
df['week'] = df['week'].astype('category', ordered=True, categories=cats)
df = df.groupby(['week','domain'])['count'].sum().unstack()
print (df)
domain A B C
week
43 5.0 1.0 NaN
44 NaN 1.0 NaN
45 1.0 4.0 NaN
50 1.0 11.0 NaN
51 4.0 NaN 6.0
1 3.0 NaN 14.0
2 NaN 3.0 NaN
3 12.0 12.0 NaN
5 NaN NaN 1.0
选项1]您可以使用定制的
weeks
索引帮助器和.loc
In [4810]: weeks = pd.Index(list(range(26, 52)) + list(range(26)))
In [4819]: dfp = df.groupby(['week','domain'])['count'].sum().unstack()
In [4820]: dfp.loc[weeks & dfp.index]
Out[4820]:
domain A B C
43 5.0 1.0 NaN
44 NaN 1.0 NaN
45 1.0 4.0 NaN
50 1.0 11.0 NaN
51 4.0 NaN 6.0
1 3.0 NaN 14.0
2 NaN 3.0 NaN
3 12.0 12.0 NaN
5 NaN NaN 1.0
In [4830]: dfp.reindex(weeks & dfp.index)
Out[4830]:
domain A B C
43 5.0 1.0 NaN
44 NaN 1.0 NaN
45 1.0 4.0 NaN
50 1.0 11.0 NaN
51 4.0 NaN 6.0
1 3.0 NaN 14.0
2 NaN 3.0 NaN
3 12.0 12.0 NaN
5 NaN NaN 1.0
选项2]或者,使用pivot
In [4821]: dfp = df.pivot('week', 'domain', 'count')
In [4822]: dfp.loc[weeks & dfp.index]
Out[4822]:
domain A B C
43 5.0 1.0 NaN
44 NaN 1.0 NaN
45 1.0 4.0 NaN
50 1.0 11.0 NaN
51 4.0 NaN 6.0
1 3.0 NaN 14.0
2 NaN 3.0 NaN
3 12.0 12.0 NaN
5 NaN NaN 1.0
选项3]或,reindex
而不是.loc
In [4810]: weeks = pd.Index(list(range(26, 52)) + list(range(26)))
In [4819]: dfp = df.groupby(['week','domain'])['count'].sum().unstack()
In [4820]: dfp.loc[weeks & dfp.index]
Out[4820]:
domain A B C
43 5.0 1.0 NaN
44 NaN 1.0 NaN
45 1.0 4.0 NaN
50 1.0 11.0 NaN
51 4.0 NaN 6.0
1 3.0 NaN 14.0
2 NaN 3.0 NaN
3 12.0 12.0 NaN
5 NaN NaN 1.0
In [4830]: dfp.reindex(weeks & dfp.index)
Out[4830]:
domain A B C
43 5.0 1.0 NaN
44 NaN 1.0 NaN
45 1.0 4.0 NaN
50 1.0 11.0 NaN
51 4.0 NaN 6.0
1 3.0 NaN 14.0
2 NaN 3.0 NaN
3 12.0 12.0 NaN
5 NaN NaN 1.0
细节
In [4826]: weeks
Out[4826]:
Int64Index([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25],
dtype='int64')
In [4827]: weeks & dfp.index
Out[4827]: Int64Index([43, 44, 45, 50, 51, 1, 2, 3, 5], dtype='int64')
问题是第44周和第2周放错了位置。第44周应该在43到45之间,第2周应该在1到3之间。嗯,那么顺序是[26,27…,51,0,1,…,25]?