Python 2.7 Panda：每个透视表的数据操作_Python 2.7_Pandas

Python 2.7 Panda：每个透视表的数据操作

python-2.7 pandas

Python 2.7 Panda：每个透视表的数据操作,python-2.7,pandas,Python 2.7,Pandas,我是Panda的新手，需要使用excel和pivot table自动生成先前显示的报告。我也不太懂透视表因为我了解python，所以我想我可以使用它。我曾经从事过csv阅读器、csv编写器和openpyxl` 我认为panda可以很好地用于数据分析，我可以用最少的代码来完成这项工作而不是使用openpyxl或csv读写器然而，由于我是新来的，我正在为如何在panda 下面是一个示例数据帧编辑：文本格式 serial_number item_name status date

我是Panda的新手，需要使用excel和pivot table自动生成先前显示的报告。我也不太懂透视表

因为我了解python，所以我想我可以使用它。我曾经从事过

csv阅读器

、csv编写器

和

openpyxl`

我认为

panda

可以很好地用于数据分析，我可以用最少的代码来完成这项工作

而不是使用

openpyxl

或

csv读写器

然而，由于我是新来的，我正在为如何在

panda

下面是一个示例数据帧

编辑：文本格式

serial_number   item_name   status  date
1   foo1        done           2015-01-11
2   foo2        done           2016-01-11
3   foo3        not_done       2015-02-12
4   foo4        not_done       2016-01-12
5   foo5        on_hold        2015-03-13
6   foo6        not_done       2016-02-13
7   foo7        done           2016-03-14
8   foo7        done           2016-02-15
9   foo8        not_done       2016-03-16
10  foo8        on_hold        2016-04-17
11  foo9        on_hold        2016-04-18

        Status          
Year    Start_Month done    not_done    on_hold total
2015    January      1       0          0        1
        February     0       1          0        1
        March        0       0          1        1
2016    January      1       1          0        2
        February     1       1          0        2
        March        1       1          0        2
        April        0       0          2        2
Grand_Total          4       4          3        11

从excel操作中，它给出以下报告

编辑：文本格式

serial_number   item_name   status  date
1   foo1        done           2015-01-11
2   foo2        done           2016-01-11
3   foo3        not_done       2015-02-12
4   foo4        not_done       2016-01-12
5   foo5        on_hold        2015-03-13
6   foo6        not_done       2016-02-13
7   foo7        done           2016-03-14
8   foo7        done           2016-02-15
9   foo8        not_done       2016-03-16
10  foo8        on_hold        2016-04-17
11  foo9        on_hold        2016-04-18

        Status          
Year    Start_Month done    not_done    on_hold total
2015    January      1       0          0        1
        February     0       1          0        1
        March        0       0          1        1
2016    January      1       1          0        2
        February     1       1          0        2
        March        1       1          0        2
        April        0       0          2        2
Grand_Total          4       4          3        11

她是我为上述操作编写pivot_表的尝试

table = pd.pivot_table(df, values=["donel","not_done","on_hold"],\
 index=["date"], columns=["status"]

这是我收到的错误信息

Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
table = pd.pivot_table(df, values=["Implementation - Successful","Closed Incomplete","Backed Out"], index=["chg_year","chg_month"], columns=["chg_state"]
  File "C:\Python27\lib\site-packages\pandas\tools\pivot.py", line 121, in pivot_table
    agged = grouped.agg(aggfunc)
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 3597, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 3114, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\core\base.py", line 428, in _aggregate
    return getattr(self, arg)(*args, **kwargs), None
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 964, in mean
    return self._cython_agg_general('mean')
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 3048, in _cython_agg_general
    how, numeric_only=numeric_only)
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 3094, in _cython_agg_blocks
    raise DataError('No numeric types to aggregate')
DataError: No numeric types to aggregate

回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
table=pd.pivot_table（df，值=[“实施-成功”、“关闭未完成”、“退出”]，索引=[“变更年份”、“变更月份”]，列=[“变更状态”]
pivot_表中第121行的文件“C:\Python27\lib\site packages\pandas\tools\pivot.py”
agged=grouped.agg（aggfunc）
文件“C:\Python27\lib\site packages\pandas\core\groupby.py”，第3597行，总计
返回super（DataFrameGroupBy，self）.aggregate（arg，*args，**kwargs）
文件“C:\Python27\lib\site packages\pandas\core\groupby.py”，第3114行，总计
结果，how=self.\u聚合（arg，\u-level=\u-level，*args，**kwargs）
文件“C:\Python27\lib\site packages\pandas\core\base.py”，第428行，聚合
返回getattr（self，arg）（*args，**kwargs），无
文件“C:\Python27\lib\site packages\pandas\core\groupby.py”，第964行，平均值
返回自我。_cython_agg_general（'mean'））
文件“C:\Python27\lib\site packages\pandas\core\groupby.py”，第3048行，在cython\u agg\u general中
如何，仅数值=仅数值）
文件“C:\Python27\lib\site packages\pandas\core\groupby.py”，第3094行，在cython\u agg\u块中
raise DATABERROR（'没有要聚合的数字类型'）
DataError:没有要聚合的数字类型

下次，请以文本格式提供您的示例数据，或者更好的是，作为生成它的代码

import pandas as pd

df = pd.DataFrame({'serial_number': range(1, 12),
                   'item_name': list(map(lambda x: 'foo' + str(x),
                                         [1, 2, 3, 4, 5, 6, 7, 7, 8, 8, 9])),
                   'status': ['done', 'done', 'not_done', 'not_done', 'on_hold',
                              'not_done', 'done', 'done', 'not_done', 'on_hold',
                              'on_hold'],
                   'date': ['2015-01-01', '2016-01-01', '2015-02-12', '2016-01-12',
                            '2015-03-13', '2016-02-13', '2016-03-14', '2016-02-15',
                            '2016-03-16', '2016-04-17', '2016-04-18']})
df['date'] = pd.to_datetime(df['date'])

使用

pd.crosstab

（而不是

pd.pivot\u table

）并按月重新采样

output = pd.crosstab(df['date'], df['status']).resample('M').sum().dropna()

计算每行的总数

output['total'] = output.sum(axis=1)

重新编制索引，以按照所需的输出，获得格式良好的年和月

dates = output.index.to_series()
output.index = pd.MultiIndex.from_arrays(
    [dates.dt.year, dates.dt.strftime('%B')],
    names=['Year', 'Start_Month'])
print(output)

# status            done  not_done  on_hold  total
# Year Start_Month                                
# 2015 January       1.0       0.0      0.0    1.0
#      February      0.0       1.0      0.0    1.0
#      March         0.0       0.0      1.0    1.0
# 2016 January       1.0       1.0      0.0    2.0
#      February      1.0       1.0      0.0    2.0
#      March         1.0       1.0      0.0    2.0
#      April         0.0       0.0      2.0    2.0

按列列出的总计在同一数据框中没有位置

grand_total = output.sum()
print(grand_total)

# status
# done         4.0
# not_done     4.0
# on_hold      3.0
# total       11.0
# dtype: float64

试试这个：

    #break out month and year
    df.loc[:, 'month'] = df['date'].apply(lambda x: x.month)
    df.loc[:, 'year'] = df['date'].apply(lambda x: x.year)
    #aggregate with column headers as 'status' values
    df.pivot_table(index = ['month', 'year'], columns = ['status'], values = 'item_name', aggfunc = 'count')

收益率：

status      done  not_done
month year                
1     2016   1.0       NaN
2     2016   NaN       1.0
3     2016   1.0       NaN
4     2016   NaN       1.0
5     2016   1.0       NaN
6     2016   NaN       1.0
7     2016   1.0       NaN
8     2016   1.0       NaN
9     2016   1.0       NaN
10    2016   1.0       NaN

嗨，阿尔贝托，我用文本格式的样本数据更新了问题。不过有点晚了。你的回答最接近我的要求。但是有一个问题，是否可以为每个类别存储

项目名称

？例如，对于

一月

完成

类别，我如何将

foo1

存储为项目名称。我的下一步是将其移植到html，这样我就可以提供指向每个非零类别的超链接，以引用回原始项目。可能会将它们存储在一个列表中以便以后访问吗？@Anil\M:那是另一个问题。我可以建议你把它作为另一个问题发布吗？