Python Pandas行过滤器和特定行和列的划分
我有以下数据帧:-Python Pandas行过滤器和特定行和列的划分,python,pandas,numpy,dataframe,pandas-groupby,Python,Pandas,Numpy,Dataframe,Pandas Groupby,我有以下数据帧:- traffic_type date region total_views desktop 01/04/2018 aug 50 mobileweb 01/04/2018 aug 60 total 01/04/2018 aug 100 desktop 01/04/2018 world 20 mobileweb 01/04/2018 wo
traffic_type date region total_views
desktop 01/04/2018 aug 50
mobileweb 01/04/2018 aug 60
total 01/04/2018 aug 100
desktop 01/04/2018 world 20
mobileweb 01/04/2018 world 30
total 01/04/2018 world 40
我需要按流量类型、日期、地区分组,并过滤流量类型为total的行,并在同一行中创建一个桌面共享列,该列为流量的总视图\u type==桌面/流量的总视图\u type==总计。此列的其余行为空
traffic_type date region total_views desktop_share
desktop 01/04/2018 aug 50
mobileweb 01/04/2018 aug 60
total 01/04/2018 aug 200 0.25
desktop 01/04/2018 world 20
mobileweb 01/04/2018 world 30
total 01/04/2018 world 40 0.5
我有一个很长的方法,但我正在寻找更精确的方法
基于numpy或者仅仅是熊猫。
我的解决方案:
df1 = df2.loc[df2.traffic_type == 'desktop']
df1 = df1[['date', 'region', 'total_views']]
df1 = df2.merge(df1, how='left', on=['region', 'date'], suffixes=('', '_desktop'))
df1 = df1.loc[df1.traffic_type == 'total']
df1['desktop_share'] = df1['total_views_desktop'] / df1['total_views']
df1 = df1[['date', 'region', 'desktop_share', 'traffic_type']]
dfinal = df2.merge(df1, how='left', on=['region', 'date', 'traffic_type'])
关于旋转的一个想法:
df1 = df.pivot_table(index=['date','region'],
columns='traffic_type',
values='total_views',
aggfunc='sum')
print (df1)
traffic_type desktop mobileweb total
date region
01/04/2018 aug 50 60 200
world 20 30 40
df2 = df1['desktop'].div(df1['total']).reset_index(name='desktop_share').assign(traffic_type='total')
df = df.merge(df2, how='left')
print (df)
traffic_type date region total_views desktop_share
0 desktop 01/04/2018 aug 50 NaN
1 mobileweb 01/04/2018 aug 60 NaN
2 total 01/04/2018 aug 200 0.25
3 desktop 01/04/2018 world 20 NaN
4 mobileweb 01/04/2018 world 30 NaN
5 total 01/04/2018 world 40 0.50
多索引的另一个想法是:
df1 = df.set_index(['traffic_type','date','region'])
a = df1.xs('desktop', drop_level=False).rename({'desktop':'total'})
b = df1.xs('total', drop_level=False)
df = df1.assign(desktop_share = a['total_views'].div(b['total_views'])).reset_index()
print (df)
traffic_type date region total_views desktop_share
0 desktop 01/04/2018 aug 50 NaN
1 mobileweb 01/04/2018 aug 60 NaN
2 total 01/04/2018 aug 200 0.25
3 desktop 01/04/2018 world 20 NaN
4 mobileweb 01/04/2018 world 30 NaN
5 total 01/04/2018 world 40 0.50