Pandas 创建包含不同条目的计数的新列

Pandas 创建包含不同条目的计数的新列,pandas,Pandas,我正在学习熊猫,并下载了2008年奥运会所有奖牌成绩的数据集。其形式如下: In[138]: medals.head() Out[138]: City Edition Sport Discipline Athlete NOC \ 9792 Rome 1960 Aquatics Diving PHELPS, Brian Eric GBR 9793 Rome 1960 A

我正在学习熊猫,并下载了2008年奥运会所有奖牌成绩的数据集。其形式如下:

In[138]: medals.head()
Out[138]: 
      City  Edition     Sport Discipline                      Athlete  NOC  \
9792  Rome     1960  Aquatics     Diving           PHELPS, Brian Eric  GBR   
9793  Rome     1960  Aquatics     Diving        WEBSTER, Robert David  USA   
9794  Rome     1960  Aquatics     Diving         TOBIAN, Gary Milburn  USA   
9795  Rome     1960  Aquatics     Diving               KRUTOVA, Ninel  URS   
9796  Rome     1960  Aquatics     Diving  KRÄMER-ENGEL-GULBIN, Ingrid  EUA   

     Gender         Event Event_gender   Medal  
9792    Men  10m platform            M  Bronze  
9793    Men  10m platform            M    Gold  
9794    Men  10m platform            M  Silver  
9795  Women  10m platform            W  Bronze  
9796  Women  10m platform            W    Gold  
我最初想做的是将其转换为一个数据框架,其中包含
Edition、NOC、brown、Silver、Gold列,其中brown、Silver和Gold是国家奥委会在该届奥运会上获得的各级别奖牌的总数

到目前为止,我已经

"""
Analyze historical Olympic performance
"""

import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
matplotlib.style.use('ggplot')

isocodes = pd.read_csv('countrycodes.csv')
for k in ['official_name_en', 'official_name_fr', 'name',
          'ITU', 'MARC', 'WMO', 'DS', 'Dial', 'FIFA', 
          'FIPS', 'GAUL', 'IOC', 'ISO4217-currency_alphabetic_code',
          'ISO4217-currency_country_name', 'ISO4217-currency_minor_unit',
          'ISO4217-currency_name', 'ISO4217-currency_numeric_code',
          'is_independent', 'Capital', 'TLD', 'Languages',
          'geonameid', 'EDGAR' ]:         
          del isocodes[k]     

allmedals = pd.read_excel('medals.xlsx', sheetname='Medals')
ioccodes = pd.read_excel('medals.xlsx', sheetname='Codes')
del ioccodes['Country.1']
codes=pd.merge(ioccodes, isocodes, left_on='ISO code', 
               right_on='ISO3166-1-Alpha-2')

# Convert the year of the games to int from str and
# then filter out all records before 1960

pd.to_numeric(allmedals['Edition'])
medals = allmedals[(allmedals['Edition'] >= 1960)]

# Filter out any duplicates - i.e. for events like the relay
# where each team member is awarded a medal

medals = medals.drop_duplicates(['City', 'Edition', 'Sport', 
                        'Discipline', 'NOC', 'Gender',
                        'Event', 'Event_gender', 'Medal'])

# Now get the medal counts for each Olympics

grouped = medals.groupby(["Edition", "NOC", "Medal"])["Medal"].\
                        count().reset_index(name="count")
我知道这一定是一个相当标准的熊猫行动,我几乎做到了:

In[139]: grouped.head()
Out[139]: 
   Edition  NOC   Medal  count
0     1960  ARG  Bronze      1
1     1960  ARG  Silver      1
2     1960  AUS  Bronze      6
3     1960  AUS    Gold      8
4     1960  AUS  Silver      8
但我无法确定如何对分组的数据帧进行分组/聚合。如果有任何提示(以及任何其他建议,例如使用
del
drop\u duplicates()
等是否被视为良好做法?)我将不胜感激。

取消标记
奖牌
栏:

res = grouped.set_index(['Edition', 'NOC', 'Medal']).unstack('Medal', fill_value=0)
res.columns = res.columns.droplevel(0)
输出(来自引用的
分组.head()
):

样品df 解决方案
您能否显示最终数据帧的外观?从构建分组的行中删除
.reset\u index(name=“count”)
,这也可以通过快速补充来完成。res有一个由Edition和NOC列的元组组成的索引。我如何修改res,使其具有五列:Edition、NOC、brown、Silver、Gold,而不是元组索引?(同样,我确信这必须是基本的)。在末尾添加一行:
res=res.reset\u index()
Medal        Bronze  Gold  Silver
Edition NOC                      
1960    ARG       1     0       1
        AUS       6     8       8
ioccodes = ['ABC', 'BCD', 'CDE', 'DEF', 'EFG', 'FGH', 'GHI']
idx = pd.MultiIndex.from_product([np.arange(1960, 2016, 4), ['Gold', 'Silver', 'Bronze']], names=['Edition', 'Medal'])
df = pd.DataFrame({'NOC': np.random.choice(ioccodes, len(idx))}, idx).reset_index()
df.groupby(['Edition', 'Medal']).NOC.value_counts() \
    .unstack(1).fillna(0).reset_index().rename_axis(None, 1)