Python 将panda中的列值求和,并将总计附加或合并到数据帧?
我得到了这个函数:Python 将panda中的列值求和,并将总计附加或合并到数据帧?,python,pandas,Python,Pandas,我得到了这个函数: def source_revenue(self): items = self.data.items() df = pandas.DataFrame( {'SOURCE OF BUSINESS': [i[0] for i in items], 'INCOME': [i[1] for i in items]}) pivoting = pd.pivot_table(df, index=['SOURCE OF BUSINESS'], value
def source_revenue(self):
items = self.data.items()
df = pandas.DataFrame(
{'SOURCE OF BUSINESS': [i[0] for i in items], 'INCOME': [i[1] for i in items]})
pivoting = pd.pivot_table(df, index=['SOURCE OF BUSINESS'], values=['INCOME'])
suming = pivoting.sum(index=(0), columns=(1))
此函数产生以下结果:
INCOME 216424.9
dtype: float64
不求和,它返回完整的数据帧,如下所示:
INCOME
SOURCE OF BUSINESS
BYD - Other 500.0
BYD - Retail 1584.0
BYD - Transport 42498.0
BYD Beverage - A La Carte 39401.5
BYD Food - A La Carte 瓦厂食品-零点 68365.0
BYD Food - Catering Banquet 53796.0
BYD Rooms 瓦厂房间 5148.0
GS - Retail 386.0
GS Food - A La Carte 48.0
Orchard Retail 130.0
SCH - Food - A La Carte 96.0
SCH - Retail 375.4
SCH - Transport 888.0
SCH Beverage - A La Carte 119.0
Spa 3052.0
XLM Beverage - A La Carte 38.0
我这样做的原因是,我试图获取所有返回行的总数,求和并将总数附加到数据帧
最初,我尝试使用margins=True(我在这里读到的是求和并将总数附加到数据帧,而不是True)
所以我想知道的是,是否有一种方法可以返回数据帧,但也可以将值相加,并将总数附加到数据帧的末尾,就像
margins=True
所做的那样。我想你可以使用as,因为这里groupby
更快
您可以使用pivot\u table
,但是默认的aggfunc
是np.mean
。这很容易让人忘记:
pivoting = pd.pivot_table(df,
index=['SOURCE OF BUSINESS'],
values=['INCOME'],
aggfunc=np.mean)
我想您需要aggfunc=np.sum
:
print df
A B C D
0 zoo one small 1
1 zoo one large 2
2 zoo one large 2
3 foo two small 3
4 foo two small 3
5 bar one large 4
6 bar one small 5
7 bar two small 6
8 bar two large 7
print pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
A
bar 22
foo 6
zoo 5
Name: D, dtype: int64
df1 = df.groupby('A')['D'].sum()
print df1
A
bar 22
foo 6
zoo 5
Name: D, dtype: int64
如果需要将Total
添加到系列中,请使用和:
计时:
In [111]: %timeit df.groupby('A')['D'].sum()
1000 loops, best of 3: 581 µs per loop
In [112]: %timeit pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
100 loops, best of 3: 2.28 ms per loop
通过以下方式在您的df
中添加Total
:
df.ix[len(df)]=…
将在数据帧的末尾添加一行。然后,您的数据需要匹配正确的列数。此外,我不建议将此添加到您的数据中,因为任何后续分析都是无效的。可能最好创建一个新系列,然后在需要时进行concat显示
df.ix[len(df)] = ['Total', df.INCOME.sum()]
>>> df
SOURCE OF BUSINESS INCOME
0 BYD - Other 500
1 BYD - Retail 1584
2 BYD - Transport 42498
3 BYD Beverage - A La Carte 39401.5
4 BYD Food - A La Carte _______ 68365
5 BYD Food - Catering Banquet 53796
6 BYD Rooms ____ 5148
7 GS - Retail 386
8 GS Food - A La Carte 48
9 Orchard Retail 130
10 SCH - Food - A La Carte 96
11 SCH - Retail 375.4
12 SCH - Transport 888
13 SCH Beverage - A La Carte 119
14 Spa 3052
15 XLM Beverage - A La Carte 38
16 Total 216425
感谢您的详尽回答和性能测试。我得到
NameError:在尝试实现np.sum时未定义名称“np”
。。。可能缺少导入?好的,我必须导入numpy
,实际属性是numpy.sum
如果使用import numpy as np
,可以使用np.sum
。
print df
INCOME
SOURCE OF BUSINESS
BYD - Other 500.0
BYD - Retail 1584.0
BYD - Transport 42498.0
BYD Beverage - A La Carte 39401.5
BYD Food - A La Carte 68365.0
BYD Food - Catering Banquet 53796.0
BYD Rooms 5148.0
GS - Retail 386.0
GS Food - A La Carte 48.0
Orchard Retail 130.0
SCH - Food - A La Carte 96.0
SCH - Retail 375.4
SCH - Transport 888.0
SCH Beverage - A La Carte 119.0
Spa 3052.0
XLM Beverage - A La Carte 38.0
df.loc['Total', 'INCOME'] = df['INCOME'].sum()
print df
INCOME
SOURCE OF BUSINESS
BYD - Other 500.0
BYD - Retail 1584.0
BYD - Transport 42498.0
BYD Beverage - A La Carte 39401.5
BYD Food - A La Carte 68365.0
BYD Food - Catering Banquet 53796.0
BYD Rooms 5148.0
GS - Retail 386.0
GS Food - A La Carte 48.0
Orchard Retail 130.0
SCH - Food - A La Carte 96.0
SCH - Retail 375.4
SCH - Transport 888.0
SCH Beverage - A La Carte 119.0
Spa 3052.0
XLM Beverage - A La Carte 38.0
Total 216424.9
df.ix[len(df)] = ['Total', df.INCOME.sum()]
>>> df
SOURCE OF BUSINESS INCOME
0 BYD - Other 500
1 BYD - Retail 1584
2 BYD - Transport 42498
3 BYD Beverage - A La Carte 39401.5
4 BYD Food - A La Carte _______ 68365
5 BYD Food - Catering Banquet 53796
6 BYD Rooms ____ 5148
7 GS - Retail 386
8 GS Food - A La Carte 48
9 Orchard Retail 130
10 SCH - Food - A La Carte 96
11 SCH - Retail 375.4
12 SCH - Transport 888
13 SCH Beverage - A La Carte 119
14 Spa 3052
15 XLM Beverage - A La Carte 38
16 Total 216425