Python 对数据帧进行分组或重新采样，不包括列_Python_Pandas_Dataframe

Python 对数据帧进行分组或重新采样，不包括列

python pandas dataframe

Python 对数据帧进行分组或重新采样，不包括列,python,pandas,dataframe,Python,Pandas,Dataframe,我想展平我的数据框，以便name之后的所有列都按dateTimeGmt中的小时分组，然后按id/name分组我尝试了df2=df.groupby（[df.dateTimeGmt.dt.date，df.dateTimeGmt.dt.hour，df.id，df.name]）.sum（）这似乎可行，但将我的所有分组列合并到了索引中 df3=df.groupby（[df.dateTimeGmt.dt.date，df.dateTimeGmt.dt.hour，df.id，df.name]，as_index

我想展平我的数据框，以便

name

之后的所有列都按

dateTimeGmt

中的小时分组，然后按

id

name

分组

我尝试了

df2=df.groupby（[df.dateTimeGmt.dt.date，df.dateTimeGmt.dt.hour，df.id，df.name]）.sum（）

这似乎可行，但将我的所有分组列合并到了索引中

df3=df.groupby（[df.dateTimeGmt.dt.date，df.dateTimeGmt.dt.hour，df.id，df.name]，as_index=False）。sum（）

保留

id

和

name

但

dateTimeGmt

数据丢失

如何在不丢失已分组的列的情况下对数据进行分组？

在您的解决方案中，有必要为

日期添加重命名，并为小时添加列名，以避免重复列名，最后：
或者可以按小时频率使用：
df2 = (df.groupby([df.dateTimeGmt.dt.date.rename('date'),
                   df.dateTimeGmt.dt.hour.rename('h'), 'id', 'name'])
         .sum()
         .reset_index())
print (df2)
         date   h  id  name    a    b    c
0  2020-01-01   6   4  four  1.0  3.0  0.0
1  2020-01-01   6   6   six  0.0  3.0  0.0
2  2020-01-01   7   4  four  0.0  0.0  0.0
3  2020-01-01   7   5  five  0.0  0.0  2.0
4  2020-01-01  10   5  five  0.0  0.0  0.0
5  2020-01-01  10   6   six  5.0  0.0  0.0
6  2020-01-01  11   6   six  0.0  0.0  0.0

在这种情况下，我们需要使用as_index=True
和.reset_index（）
我知道，当as_index=False
时，您只能保留原始列中的列dataframe@ansev-我认为这里是非常重要的重命名
以避免重复列名，所以重新开始另一件事，在所有行都是nan的情况下，我可以保留nan而不是0吗？pd.Grouper是完美的。还有一件事，我可以保留nan而不是0吗？@Olivia-是的，使用min\u count=1
像df.groupby（[pd.Grouper（freq='H'，key='dateTimeGmt'），'id'，name']）sum（min\u count 1）。重置索引（）
df2 = (df.groupby([df.dateTimeGmt.dt.date.rename('date'),
                   df.dateTimeGmt.dt.hour.rename('h'), 'id', 'name'])
         .sum()
         .reset_index())
print (df2)
         date   h  id  name    a    b    c
0  2020-01-01   6   4  four  1.0  3.0  0.0
1  2020-01-01   6   6   six  0.0  3.0  0.0
2  2020-01-01   7   4  four  0.0  0.0  0.0
3  2020-01-01   7   5  five  0.0  0.0  2.0
4  2020-01-01  10   5  five  0.0  0.0  0.0
5  2020-01-01  10   6   six  5.0  0.0  0.0
6  2020-01-01  11   6   six  0.0  0.0  0.0

df2 = df.groupby([pd.Grouper(freq='H', key='dateTimeGmt'), 'id', 'name']).sum().reset_index()
print (df2)
          dateTimeGmt  id  name    a    b    c
0 2020-01-01 06:00:00   4  four  1.0  3.0  0.0
1 2020-01-01 06:00:00   6   six  0.0  3.0  0.0
2 2020-01-01 07:00:00   4  four  0.0  0.0  0.0
3 2020-01-01 07:00:00   5  five  0.0  0.0  2.0
4 2020-01-01 10:00:00   5  five  0.0  0.0  0.0
5 2020-01-01 10:00:00   6   six  5.0  0.0  0.0
6 2020-01-01 11:00:00   6   six  0.0  0.0  0.0