Python 如果/那么聚合_Python_Pandas_Numpy_Aggregation

Python 如果/那么聚合

python pandas numpy

Python 如果/那么聚合,python,pandas,numpy,aggregation,Python,Pandas,Numpy,Aggregation,我一直在寻找，但还没有找到答案。希望有人能帮助这个python新手解决我的问题我试图弄清楚如何在python中编写if/then语句，并对该if/then语句执行聚合。我的最终目标是，如果日期=2017年1月7日，则使用“假”列中的值。如果date=all-else，则将两列求平均值以下是我到目前为止的情况： import pandas as pd import numpy as np import datetime np.random.seed(42) dte=pd.date_range

我一直在寻找，但还没有找到答案。希望有人能帮助这个python新手解决我的问题

我试图弄清楚如何在python中编写if/then语句，并对该if/then语句执行聚合。我的最终目标是，如果日期=2017年1月7日，则使用“假”列中的值。如果date=all-else，则将两列求平均值

以下是我到目前为止的情况：

import pandas as pd
import numpy as np
import datetime

np.random.seed(42)
dte=pd.date_range(start=datetime.date(2017,1,1), end= datetime.date(2017,1,15))
fake=np.random.randint(15,100, size=15)
fake2=np.random.randint(300,1000,size=15)

so_df=pd.DataFrame({'date':dte,
             'fake':fake,
             'fake2':fake2})

so_df['avg']= so_df[['fake','fake2']].mean(axis=1)
so_df.head()

让我们使用

np.where

：

so_df['avg'] = np.where(so_df['date'] == pd.to_datetime('2017-01-07'), 
                        so_df['fake'], so_df[['fake',
                        'fake2']].mean(1))

输出：

         date  fake  fake2    avg
0  2017-01-01    66    685  375.5
1  2017-01-02    29    491  260.0
2  2017-01-03    86    576  331.0
3  2017-01-04    75    460  267.5
4  2017-01-05    35    759  397.0
5  2017-01-06    97    613  355.0
6  2017-01-07    89    321   89.0
7  2017-01-08    89    552  320.5
8  2017-01-09    38    860  449.0
9  2017-01-10    17    774  395.5
10 2017-01-11    36    358  197.0
11 2017-01-12    67    810  438.5
12 2017-01-13    16    981  498.5
13 2017-01-14    44    775  409.5
14 2017-01-15    52    999  525.5

在大熊猫身上，一种方法是使用其中有三个值：condition、if和else

so_df['avg']= np.where(so_df['date'] == '2017-01-07',so_df['fake'],so_df[['fake','fake2']].mean(axis=1))

    date        fake    fake2   avg
0   2017-01-01  66      685 375.5
1   2017-01-02  29      491 260.0
2   2017-01-03  86      576 331.0
3   2017-01-04  75      460 267.5
4   2017-01-05  35      759 397.0
5   2017-01-06  97      613 355.0
6   2017-01-07  89      321 89.0
7   2017-01-08  89      552 320.5
8   2017-01-09  38      860 449.0
9   2017-01-10  17      774 395.5
10  2017-01-11  36      358 197.0
11  2017-01-12  67      810 438.5
12  2017-01-13  16      981 498.5
13  2017-01-14  44      775 409.5
14  2017-01-15  52      999 525.5

假设您已经计算了平均列：

so_df['fake'].where(so_df['date']=='20170107', so_df['avg'])
Out: 
0     375.5
1     260.0
2     331.0
3     267.5
4     397.0
5     355.0
6      89.0
7     320.5
8     449.0
9     395.5
10    197.0
11    438.5
12    498.5
13    409.5
14    525.5
Name: fake, dtype: float64

如果不是，则可以使用相同的计算替换列引用：

so_df['fake'].where(so_df['date']=='20170107', so_df[['fake','fake2']].mean(axis=1))

要检查多个日期，需要使用or运算符的元素版本（即管道：

）。否则将引发错误

so_df['fake'].where((so_df['date']=='20170107') | (so_df['date']=='20170109'), so_df['avg'])

上面检查了两个日期。在3个或更多的情况下，您可能希望将

isin

与列表一起使用：

so_df['fake'].where(so_df['date'].isin(['20170107', '20170109', '20170112']), so_df['avg'])
Out[42]: 
0     375.5
1     260.0
2     331.0
3     267.5
4     397.0
5     355.0
6      89.0
7     320.5
8      38.0
9     395.5
10    197.0
11     67.0
12    498.5
13    409.5
14    525.5
Name: fake, dtype: float64

我们也可以使用以下方法：

谢谢这些都非常有用。如果我想做1个以上的日期，比如1/7、1/9和1/11，我可以简单地写为

so_df['fake'].where（（so_df['date']='20170107'）或（so_df['date']='20170105'）或（so_df['date']='20170111'）、so_df['fake']、'fake2'].mean（axis=1））

@P.Cummings不幸的是，你不能对熊猫数据结构使用

或。您需要使用按位or的elementwise重载版本（|）。我在帖子中添加了几个例子。谢谢。这很有帮助！
In [141]: so_df['avg'] = so_df['fake'] \
     ...:                   .where(so_df['date'].isin(['2017-01-07','2017-01-09']))
     ...:                   .fillna(so_df[['fake','fake2']].mean(1))
     ...:

In [142]: so_df
Out[142]:
         date  fake  fake2    avg
0  2017-01-01    66    685  375.5
1  2017-01-02    29    491  260.0
2  2017-01-03    86    576  331.0
3  2017-01-04    75    460  267.5
4  2017-01-05    35    759  397.0
5  2017-01-06    97    613  355.0
6  2017-01-07    89    321   89.0
7  2017-01-08    89    552  320.5
8  2017-01-09    38    860   38.0
9  2017-01-10    17    774  395.5
10 2017-01-11    36    358  197.0
11 2017-01-12    67    810  438.5
12 2017-01-13    16    981  498.5
13 2017-01-14    44    775  409.5
14 2017-01-15    52    999  525.5