Indexing 如何将函数应用于日期索引数据帧
我在使用带有日期索引的数据帧时遇到很多问题Indexing 如何将函数应用于日期索引数据帧,indexing,group-by,pandas,Indexing,Group By,Pandas,我在使用带有日期索引的数据帧时遇到很多问题 from pandas import DataFrame, date_range # Create a dataframe with dates as your index data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] idx = date_range('1/1/2012', periods=10, freq='MS') df = DataFrame(data, index=idx, columns=['Revenue'
from pandas import DataFrame, date_range
# Create a dataframe with dates as your index
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
idx = date_range('1/1/2012', periods=10, freq='MS')
df = DataFrame(data, index=idx, columns=['Revenue'])
df['State'] = ['NY', 'NY', 'NY', 'NY', 'FL', 'FL', 'GA', 'GA', 'FL', 'FL']
In [6]: df
Out[6]:
Revenue State
2012-01-01 1 NY
2012-02-01 2 NY
2012-03-01 3 NY
2012-04-01 4 NY
2012-05-01 5 FL
2012-06-01 6 FL
2012-07-01 7 GA
2012-08-01 8 GA
2012-09-01 9 FL
2012-10-01 10 FL
我正在尝试添加一个名为'Mean'
的列,其中包含组平均值:
我试过这个,但不起作用:
但我想得到:
我怎样才能得到这个数据帧?你差一点就得到了!首先创建groupby对象:
means = df.groupby('State').mean()
In [5]: means
Out[5]:
Revenue
State
FL 7.5
GA 7.5
NY 2.5
然后将此应用于数据帧中的每个状态:
df['mean'] = df['State'].apply(lambda x: means.ix[x]['Revenue'])
In [7]: df
Out[7]:
Revenue State mean
2012-01-01 1 NY 2.5
2012-02-01 2 NY 2.5
2012-03-01 3 NY 2.5
2012-04-01 4 NY 2.5
2012-05-01 5 FL 7.5
2012-06-01 6 FL 7.5
2012-07-01 7 GA 7.5
2012-08-01 8 GA 7.5
2012-09-01 9 FL 7.5
2012-10-01 10 FL 7.5
使用join
或merge
也可以:
In [68]: revs = df.groupby('State').Revenue.mean()
In [69]: revs.name = 'Mean Revenue'
In [70]: df.join(revs, on='State')
Out[70]:
Revenue State Mean Revenue
2012-01-01 1 NY 2.5
2012-02-01 2 NY 2.5
2012-03-01 3 NY 2.5
2012-04-01 4 NY 2.5
2012-05-01 5 FL 7.5
2012-06-01 6 FL 7.5
2012-07-01 7 GA 7.5
2012-08-01 8 GA 7.5
2012-09-01 9 FL 7.5
2012-10-01 10 FL 7.5
df['mean'] = df['State'].apply(lambda x: means.ix[x]['Revenue'])
In [7]: df
Out[7]:
Revenue State mean
2012-01-01 1 NY 2.5
2012-02-01 2 NY 2.5
2012-03-01 3 NY 2.5
2012-04-01 4 NY 2.5
2012-05-01 5 FL 7.5
2012-06-01 6 FL 7.5
2012-07-01 7 GA 7.5
2012-08-01 8 GA 7.5
2012-09-01 9 FL 7.5
2012-10-01 10 FL 7.5
In [68]: revs = df.groupby('State').Revenue.mean()
In [69]: revs.name = 'Mean Revenue'
In [70]: df.join(revs, on='State')
Out[70]:
Revenue State Mean Revenue
2012-01-01 1 NY 2.5
2012-02-01 2 NY 2.5
2012-03-01 3 NY 2.5
2012-04-01 4 NY 2.5
2012-05-01 5 FL 7.5
2012-06-01 6 FL 7.5
2012-07-01 7 GA 7.5
2012-08-01 8 GA 7.5
2012-09-01 9 FL 7.5
2012-10-01 10 FL 7.5