Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cocoa/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 熊猫:groupby(';date#x';)[';outcome';])。mean()_Pandas_Pandas Groupby - Fatal编程技术网

Pandas 熊猫:groupby(';date#x';)[';outcome';])。mean()

Pandas 熊猫:groupby(';date#x';)[';outcome';])。mean(),pandas,pandas-groupby,Pandas,Pandas Groupby,这些代码的确切含义是什么?groupby('date_x')['output'].mean(),我在sklearn doc中找不到这个 date_x['Class probability'] = df_train.groupby('date_x')['outcome'].mean() date_x['Frequency'] = df_train.groupby('date_x')['outcome'].size() date_x.plot( secondary_y='Frequency',fig

这些代码的确切含义是什么?
groupby('date_x')['output'].mean()
,我在sklearn doc中找不到这个

date_x['Class probability'] = df_train.groupby('date_x')['outcome'].mean()
date_x['Frequency'] = df_train.groupby('date_x')['outcome'].size()
date_x.plot( secondary_y='Frequency',figsize=(22, 10))
谢谢

我认为更好的方法是使用
size
进行聚合,用于分组长度和
mean
每个分组,这些分组按列
date\x
进行分组:

d = {'mean':'Class probability','size':'Frequency'}
df = df_train.groupby('date_x')['outcome'].agg(['mean','size']).rename(columns=d)

df.plot( secondary_y='Frequency',figsize=(22, 10))
有关更多信息,请查看

样本:

d = {'date_x':pd.to_datetime(['2015-01-01','2015-01-01','2015-01-01',
                              '2015-01-02','2015-01-02']),
     'outcome':[20,30,40,50,60]}
df_train = pd.DataFrame(d)
print (df_train)
      date_x  outcome
0 2015-01-01       20 ->1.group
1 2015-01-01       30 ->1.group
2 2015-01-01       40 ->1.group
3 2015-01-02       50 ->2.group
4 2015-01-02       60 ->2.group

d = {'mean':'Class probability','size':'Frequency'}
df = df_train.groupby('date_x')['outcome'].agg(['mean','size']).rename(columns=d)
print (df)
            Class probability  Frequency
date_x                                  
2015-01-01                 30          3
2015-01-02                 55          2

您可以在
pandas
文档中找到它。关于分组的教程可能会有所帮助。当然最后一件事请查收。。最后,为什么类别概率是30/55,是不是应该是40/60?不,因为有两组-前三行的日期相同
2015-01-01
,平均值是
(20+30+40)/3=30
,最后两行的日期为
2015-01-02
的日期为
(50+60)/2=55