Python 如何对具有多种条件的大熊猫数据进行分组?
这是我的桌子Python 如何对具有多种条件的大熊猫数据进行分组?,python,pandas,Python,Pandas,这是我的桌子 timestamp date month day hour price 0 2017-01-01 00:00 01/01/2017 Jan Sun 00:00 60.23 1 2017-01-01 01:00 01/01/2017 Jan Sun 01:00 60.73 2 2017-01-01 02:00 01/01/2017 Jan Sun 02:00 75.99 3 2017-01-01 03:00 01/
timestamp date month day hour price
0 2017-01-01 00:00 01/01/2017 Jan Sun 00:00 60.23
1 2017-01-01 01:00 01/01/2017 Jan Sun 01:00 60.73
2 2017-01-01 02:00 01/01/2017 Jan Sun 02:00 75.99
3 2017-01-01 03:00 01/01/2017 Jan Sun 03:00 60.76
4 2017-01-01 04:00 01/01/2017 Jan Sun 04:00 49.01
我有全年每天24小时和每月的数据
例如,我想将每个季节的数据分组为工作日和周末
周末\冬季=11月、12月、1月、2月的所有周六和周日数据
这是一个新手,因此,如果希望通过条件筛选数据,则任何帮助都将非常有用。与通过比较创建的布尔掩码一起使用,以检查列表中的成员身份
L
:
#changed timestamp values only for better sample
print (df)
timestamp date month day hour price
0 2017-01-01 00:00:00 01/01/2017 Jan Sun 00:00 60.23
1 2017-01-03 00:00:00 01/01/2017 Jan Sun 00:00 60.23
2 2017-02-01 01:00:00 01/01/2017 Jan Sun 01:00 60.73
3 2017-02-05 01:00:00 01/01/2017 Jan Sun 01:00 60.73
4 2017-03-01 02:00:00 01/01/2017 Jan Sun 02:00 75.99
5 2017-04-01 03:00:00 01/01/2017 Jan Sun 03:00 60.76
6 2017-11-01 04:00:00 01/01/2017 Jan Sun 04:00 49.01
L = ['Nov','Dec','Jan','Feb']
mask = (df['timestamp'].dt.dayofweek > 4) & (df['month'].isin(L))
df1 = df[mask]
print (df1)
timestamp date month day hour price
0 2017-01-01 00:00:00 01/01/2017 Jan Sun 00:00 60.23
3 2017-02-05 01:00:00 01/01/2017 Jan Sun 01:00 60.73
5 2017-04-01 03:00:00 01/01/2017 Jan Sun 03:00 60.76
如果需要日期类型的新列:
df['season'] = (df['timestamp'].dt.month%12 + 3) // 3
df['state'] = np.where(df['timestamp'].dt.dayofweek > 4, 'weekend','weekdays')
print (df)
timestamp date month day hour price season state
0 2017-01-01 00:00:00 01/01/2017 Jan Sun 00:00 60.23 1 weekend
1 2017-01-03 00:00:00 01/01/2017 Jan Sun 00:00 60.23 1 weekdays
2 2017-02-01 01:00:00 01/01/2017 Jan Sun 01:00 60.73 1 weekdays
3 2017-02-05 01:00:00 01/01/2017 Jan Sun 01:00 60.73 1 weekend
4 2017-03-01 02:00:00 01/01/2017 Jan Sun 02:00 75.99 2 weekdays
5 2017-04-01 03:00:00 01/01/2017 Jan Sun 03:00 60.76 2 weekend
6 2017-11-01 04:00:00 01/01/2017 Jan Sun 04:00 49.01 4 weekdays
它可以用于带有聚合的分组,例如按求和:
df2 = df.groupby(['season','state'], as_index=False)['price'].sum()
print (df2)
season state price
0 1 weekdays 120.96
1 1 weekend 120.96
2 2 weekdays 75.99
3 2 weekend 60.76
4 4 weekdays 49.01
下面的解决方案与@jezrael略有不同,因为季节和工作日是明确定义的
import pandas as pd
df = pd.DataFrame([['2017-01-01 00:00', '01/01/2017', 'Jan', 'Mon', '00:00', 60.23],
['2017-01-01 01:00', '01/01/2017', 'Jan', 'Sat', '01:00', 60.73],
['2017-01-01 02:00', '01/01/2017', 'May', 'Tue', '02:00', 75.99],
['2017-01-01 03:00', '01/01/2017', 'Jan', 'Sun', '03:00', 60.76],
['2017-01-01 04:00', '01/01/2017', 'Sep', 'Sat', '04:00', 49.01]],
columns=['timestamp', 'date', 'month', 'day', 'hour', 'price'])
def InvertKeyListDictionary(input_dict):
return {w: k for k, v in input_dict.items() for w in v}
season_map = {'Spring': ['Mar', 'Apr', 'May'],
'Summer': ['Jun', 'Jul', 'Aug'],
'Autumn': ['Sep', 'Oct', 'Nov'],
'Winter': ['Dec', 'Jan', 'Feb']}
weekend_map = {'Weekday': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
'Weekend': ['Sat', 'Sun']}
month_map = InvertKeyListDictionary(season_map)
day_map = InvertKeyListDictionary(weekend_map)
df['season'] = df['month'].map(month_map)
df['daytype'] = df['day'].map(day_map)
df_groups = df.groupby(['season', 'daytype'])
df_groups.get_group(('Winter', 'Weekend'))
# output
# timestamp date month day hour price season daytype
# 2017-01-01 01:00 01/01/2017 Jan Sat 01:00 60.73 Winter Weekend
# 2017-01-01 03:00 01/01/2017 Jan Sun 03:00 60.76 Winter Weekend
@用户3256363你所说的团体是什么意思?将它们拆分为不同的数据帧?你喜欢群比吗?是否用所述组的名称指定一个新列?