在python中按数据帧上的列组分组_Python_Pandas_Dictionary_Dataframe_Group By

在python中按数据帧上的列组分组

python pandas dictionary dataframe

在python中按数据帧上的列组分组,python,pandas,dictionary,dataframe,group-by,Python,Pandas,Dictionary,Dataframe,Group By,从2000年到2016年，我有一个数据框架，每个月都有列 df.columns 输出 Index(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06', '2000-07', '2000-08', '2000-09', '2000-10', ... '2015-11', '2015-12', '2016-01', '2016-02', '2016-03', '2016-04',

从2000年到2016年，我有一个数据框架，每个月都有列

    df.columns

输出

    Index(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06',
   '2000-07', '2000-08', '2000-09', '2000-10',
   ...
   '2015-11', '2015-12', '2016-01', '2016-02', '2016-03', '2016-04',
   '2016-05', '2016-06', '2016-07', '2016-08'],
  dtype='object', length=200)

我想把这些专栏按季度分组。我制作了一本字典，相信使用groupby是最好的方法，然后使用aggregate和mean：

    m2q = {'2000q1': ['2000-01', '2000-02', '2000-03'],
           '2000q2': ['2000-04', '2000-05', '2000-06'],
           '2000q3': ['2000-07', '2000-08', '2000-09'],
                ...
           '2016q2': ['2016-04', '2016-05', '2016-06'],
           '2016q3': ['2016-07', '2016-08']}

但是

没有给我想要的输出。事实上，它给了我一个空的分组。有什么建议可以让这个分组有效吗？

或者可能是一种更为pythonian的解决方案，以指定列的平均值为四分之一进行分类？

您可以将索引转换为

DatetimeIndex

（示例1）或

PeriodIndex

（示例2）

另外，请查看主题以了解更多详细信息

import numpy as np
import pandas as pd


idx = ['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06',
   '2000-07', '2000-08', '2000-09', '2000-10', '2000-11', '2000-12']

df = pd.DataFrame(np.arange(12), index=idx, columns=['SAMPLE_DATA'])
print(df)

         SAMPLE_DATA
2000-01            0
2000-02            1
2000-03            2
2000-04            3
2000-05            4
2000-06            5
2000-07            6
2000-08            7
2000-09            8
2000-10            9
2000-11           10
2000-12           11

# Handle your timeseries data with pandas timeseries / date functionality
df.index=pd.to_datetime(df.index)

例1

print(df.resample('Q').sum())

            SAMPLE_DATA
2000-03-31            3
2000-06-30           12
2000-09-30           21
2000-12-31           30

例2

print(df.to_period('Q').groupby(level=0).sum())

        SAMPLE_DATA
2000Q1            3
2000Q2           12
2000Q3           21
2000Q4           30

您可以将索引转换为

DatetimeIndex

（示例1）或

PeriodIndex

（示例2）

另外，请查看主题以了解更多详细信息

import numpy as np
import pandas as pd


idx = ['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06',
   '2000-07', '2000-08', '2000-09', '2000-10', '2000-11', '2000-12']

df = pd.DataFrame(np.arange(12), index=idx, columns=['SAMPLE_DATA'])
print(df)

         SAMPLE_DATA
2000-01            0
2000-02            1
2000-03            2
2000-04            3
2000-05            4
2000-06            5
2000-07            6
2000-08            7
2000-09            8
2000-10            9
2000-11           10
2000-12           11

# Handle your timeseries data with pandas timeseries / date functionality
df.index=pd.to_datetime(df.index)

例1

print(df.resample('Q').sum())

            SAMPLE_DATA
2000-03-31            3
2000-06-30           12
2000-09-30           21
2000-12-31           30

例2

print(df.to_period('Q').groupby(level=0).sum())

        SAMPLE_DATA
2000Q1            3
2000Q2           12
2000Q3           21
2000Q4           30

您需要按特定列排序，因此您应该执行类似于df.groupby（m2q.get（'2000q1'））的操作来获取第一个季度。您需要按特定列排序，因此您应该执行类似于df.groupby（m2q.get（'2000q1'）的操作来获取第一个季度