Python 将panda中的列值求和,并将总计附加或合并到数据帧?

Python 将panda中的列值求和,并将总计附加或合并到数据帧?,python,pandas,Python,Pandas,我得到了这个函数: def source_revenue(self): items = self.data.items() df = pandas.DataFrame( {'SOURCE OF BUSINESS': [i[0] for i in items], 'INCOME': [i[1] for i in items]}) pivoting = pd.pivot_table(df, index=['SOURCE OF BUSINESS'], value

我得到了这个函数:

def source_revenue(self):
    items = self.data.items()
    df = pandas.DataFrame(
        {'SOURCE OF BUSINESS': [i[0] for i in items], 'INCOME': [i[1] for i in items]})
    pivoting = pd.pivot_table(df, index=['SOURCE OF BUSINESS'], values=['INCOME'])
    suming = pivoting.sum(index=(0), columns=(1))
此函数产生以下结果:

INCOME    216424.9
dtype: float64
不求和,它返回完整的数据帧,如下所示:

                               INCOME
SOURCE OF BUSINESS                    
BYD - Other                      500.0
BYD - Retail                    1584.0
BYD - Transport                42498.0
BYD Beverage - A La Carte      39401.5
BYD Food - A La Carte 瓦厂食品-零点  68365.0
BYD Food - Catering Banquet    53796.0
BYD Rooms 瓦厂房间                  5148.0
GS - Retail                      386.0
GS Food - A La Carte              48.0
Orchard Retail                   130.0
SCH - Food - A La Carte           96.0
SCH - Retail                     375.4
SCH - Transport                  888.0
SCH Beverage - A La Carte        119.0
Spa                             3052.0
XLM Beverage - A La Carte         38.0
我这样做的原因是,我试图获取所有返回行的总数,求和并将总数附加到数据帧

最初,我尝试使用margins=True(我在这里读到的是求和并将总数附加到数据帧,而不是True)


所以我想知道的是,是否有一种方法可以返回数据帧,但也可以将值相加,并将总数附加到数据帧的末尾,就像
margins=True
所做的那样。

我想你可以使用as,因为这里
groupby
更快

您可以使用
pivot\u table
,但是默认的
aggfunc
np.mean
。这很容易让人忘记:

pivoting = pd.pivot_table(df, 
                          index=['SOURCE OF BUSINESS'], 
                          values=['INCOME'], 
                          aggfunc=np.mean)
我想您需要
aggfunc=np.sum

print df
     A    B      C  D
0  zoo  one  small  1
1  zoo  one  large  2
2  zoo  one  large  2
3  foo  two  small  3
4  foo  two  small  3
5  bar  one  large  4
6  bar  one  small  5
7  bar  two  small  6
8  bar  two  large  7

print pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
A
bar    22
foo     6
zoo     5
Name: D, dtype: int64

df1 = df.groupby('A')['D'].sum()
print df1
A
bar    22
foo     6
zoo     5
Name: D, dtype: int64
如果需要将
Total
添加到系列中,请使用和:

计时

In [111]: %timeit df.groupby('A')['D'].sum()
1000 loops, best of 3: 581 µs per loop

In [112]: %timeit pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
100 loops, best of 3: 2.28 ms per loop
通过以下方式在您的
df
中添加
Total

df.ix[len(df)]=…
将在数据帧的末尾添加一行。然后,您的数据需要匹配正确的列数。此外,我不建议将此添加到您的数据中,因为任何后续分析都是无效的。可能最好创建一个新系列,然后在需要时进行concat显示

df.ix[len(df)] = ['Total', df.INCOME.sum()]

>>> df
                 SOURCE OF BUSINESS   INCOME
0                       BYD - Other      500
1                      BYD - Retail     1584
2                   BYD - Transport    42498
3         BYD Beverage - A La Carte  39401.5
4   BYD Food - A La Carte _______      68365
5       BYD Food - Catering Banquet    53796
6                    BYD Rooms ____     5148
7                       GS - Retail      386
8              GS Food - A La Carte       48
9                    Orchard Retail      130
10          SCH - Food - A La Carte       96
11                     SCH - Retail    375.4
12                  SCH - Transport      888
13        SCH Beverage - A La Carte      119
14                              Spa     3052
15        XLM Beverage - A La Carte       38
16                            Total   216425

感谢您的详尽回答和性能测试。我得到
NameError:在尝试实现np.sum时未定义名称“np”
。。。可能缺少导入?好的,我必须导入
numpy
,实际属性是
numpy.sum
如果使用
import numpy as np
,可以使用
np.sum
print df
                              INCOME
SOURCE OF BUSINESS                  
BYD - Other                    500.0
BYD - Retail                  1584.0
BYD - Transport              42498.0
BYD Beverage - A La Carte    39401.5
BYD Food - A La Carte        68365.0
BYD Food - Catering Banquet  53796.0
BYD Rooms                     5148.0
GS - Retail                    386.0
GS Food - A La Carte            48.0
Orchard Retail                 130.0
SCH - Food - A La Carte         96.0
SCH - Retail                   375.4
SCH - Transport                888.0
SCH Beverage - A La Carte      119.0
Spa                           3052.0
XLM Beverage - A La Carte       38.0
df.loc['Total', 'INCOME'] = df['INCOME'].sum()
print df
                               INCOME
SOURCE OF BUSINESS                   
BYD - Other                     500.0
BYD - Retail                   1584.0
BYD - Transport               42498.0
BYD Beverage - A La Carte     39401.5
BYD Food - A La Carte         68365.0
BYD Food - Catering Banquet   53796.0
BYD Rooms                      5148.0
GS - Retail                     386.0
GS Food - A La Carte             48.0
Orchard Retail                  130.0
SCH - Food - A La Carte          96.0
SCH - Retail                    375.4
SCH - Transport                 888.0
SCH Beverage - A La Carte       119.0
Spa                            3052.0
XLM Beverage - A La Carte        38.0
Total                        216424.9
df.ix[len(df)] = ['Total', df.INCOME.sum()]

>>> df
                 SOURCE OF BUSINESS   INCOME
0                       BYD - Other      500
1                      BYD - Retail     1584
2                   BYD - Transport    42498
3         BYD Beverage - A La Carte  39401.5
4   BYD Food - A La Carte _______      68365
5       BYD Food - Catering Banquet    53796
6                    BYD Rooms ____     5148
7                       GS - Retail      386
8              GS Food - A La Carte       48
9                    Orchard Retail      130
10          SCH - Food - A La Carte       96
11                     SCH - Retail    375.4
12                  SCH - Transport      888
13        SCH Beverage - A La Carte      119
14                              Spa     3052
15        XLM Beverage - A La Carte       38
16                            Total   216425