Python 2.7 pandas-保存顺序时透视表失败

Python 2.7 pandas-保存顺序时透视表失败,python-2.7,pandas,pivot-table,pandas-groupby,Python 2.7,Pandas,Pivot Table,Pandas Groupby,我有以下数据框架,其中一周不是ISO周而是会计周(1是7月的第一周,52是6月的最后一周): 我希望在保持每周顺序的同时透视此表,以获得如下所示的新数据帧,其中值为count,列为域: > new_df week A B C 43 5 1 NaN 44 NaN 1 NaN 45 1 4 NaN 50 1 11 NaN 51 4 NaN 6 1

我有以下数据框架,其中一周不是ISO周而是会计周(1是7月的第一周,52是6月的最后一周):

我希望在保持每周顺序的同时透视此表,以获得如下所示的新数据帧,其中值为count,列为域:

> new_df
week   A      B     C
43      5     1   NaN
44    NaN     1   NaN
45      1     4   NaN      
50      1    11   NaN
51      4   NaN     6
1       3   NaN    14
2     NaN     3   NaN
3      12    12   NaN
5     NaN   NaN     1
我尝试使用groupie和Unstack,但出现以下错误:

> df = df.groupby(['week'], sort=False)['count'].unstack('domain')
AttributeError: Cannot access callable attribute 'unstack' of 'SeriesGroupBy' objects, try using the 'apply' method

您需要
周的自定义订单
s,因此需要使用自定义订单并省略
sort=False

cats = list(range(26, 52)) + list(range(26))
print (cats)
[26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
 47, 48, 49, 50, 51, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

df['week'] = df['week'].astype('category', ordered=True, categories=cats)

df = df.groupby(['week','domain'])['count'].sum().unstack()
print (df)
domain     A     B     C
week                    
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0

选项1]您可以使用定制的
weeks
索引帮助器和
.loc

In [4810]: weeks = pd.Index(list(range(26, 52)) + list(range(26)))

In [4819]: dfp = df.groupby(['week','domain'])['count'].sum().unstack()

In [4820]: dfp.loc[weeks & dfp.index]
Out[4820]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0
In [4830]: dfp.reindex(weeks & dfp.index)
Out[4830]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0
选项2]或者,使用
pivot

In [4821]: dfp = df.pivot('week', 'domain', 'count')

In [4822]: dfp.loc[weeks & dfp.index]
Out[4822]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0
选项3]或,
reindex
而不是
.loc

In [4810]: weeks = pd.Index(list(range(26, 52)) + list(range(26)))

In [4819]: dfp = df.groupby(['week','domain'])['count'].sum().unstack()

In [4820]: dfp.loc[weeks & dfp.index]
Out[4820]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0
In [4830]: dfp.reindex(weeks & dfp.index)
Out[4830]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0

细节

In [4826]: weeks
Out[4826]:
Int64Index([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
            43, 44, 45, 46, 47, 48, 49, 50, 51,  0,  1,  2,  3,  4,  5,  6,  7,
             8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
            25],
           dtype='int64')

In [4827]: weeks & dfp.index
Out[4827]: Int64Index([43, 44, 45, 50, 51, 1, 2, 3, 5], dtype='int64')

问题是第44周和第2周放错了位置。第44周应该在43到45之间,第2周应该在1到3之间。嗯,那么顺序是[26,27…,51,0,1,…,25]?