Pandas,Python:如何在满足第二个条件的同时跨列获取最大值

Pandas,Python:如何在满足第二个条件的同时跨列获取最大值,python,pandas,Python,Pandas,我试图从一组列中选择最大值,同时还要满足第二个条件。此处的最大值对应于相对于前一行的pct_变化。第二个条件对应于每列值占该行总和的百分比 本质上,我试图获得列中的最大值,但仅限于满足第二个条件的列。我使用下面的代码创建了一个示例 import pandas as pd import numpy as np # Creating series to initialize df series_1_units = pd.Series(np.array([1,20,25,1,9])) series_

我试图从一组列中选择最大值,同时还要满足第二个条件。此处的最大值对应于相对于前一行的pct_变化。第二个条件对应于每列值占该行总和的百分比

本质上,我试图获得列中的最大值,但仅限于满足第二个条件的列。我使用下面的代码创建了一个示例

import pandas as pd
import numpy as np

# Creating series to initialize df
series_1_units = pd.Series(np.array([1,20,25,1,9]))
series_2_units = pd.Series(np.array([1,1,30,25,1]))
series_3_units = pd.Series(np.array([1,1,1,25,30]))

df = pd.DataFrame({'Type1':series_1_units, 'Type2':series_2_units, 'Type3':series_3_units})
# Calculate the % contribution of each type to total units summed across row
df_contrib_to_total = df.div(df.sum(axis=1), axis=0)*100.0

# Calculate % difference to previous row
df_pct_diff = df.pct_change()

# Join the different df to compare
df_all_cols = df.join(df_pct_diff, rsuffix='_Pct_Change')
df_all_cols = df_all_cols.join(df_contrib_to_total,rsuffix='_Contrib')

# A final requirement is setting a threshold that decides whether a given column is to be included or excluded 
# This is based on number of units relative to total for each row
# If value below threshold then do not include in max calculation for each week
contribution_threshold = 25.0
contribution_mask = df_contrib_to_total >= contribution_threshold
df_all_cols = df_all_cols.join(contribution_mask, rsuffix='_Contrib_Mask')

# Get the column with the highest Pct_change for each row - get the actual pct_change value as well as the column name responsible for it
df_all_cols['Highest_Pct_Diff'] = df_all_cols.iloc[:,3:6].max(axis=1)
df_all_cols['Type_With_Highest_Pct_Diff'] = df_all_cols.iloc[:,3:6].idxmax(axis=1)

# Above df has an incorrect result in row correspondint to index = 4
# The highest pct_diff column has a False for its contribution mask
# Desired result is as below:

# The highest pct_change for any column that has a True in contrib mask is Type_3
df_all_cols_desired_result = df_all_cols.copy(deep=True)
df_all_cols_desired_result.iloc[4,12] = 0.2
df_all_cols_desired_result.iloc[4,13] = 'Type3_Pct_Change'

我如何应用多个条件来实现上述目标

如果只能从某些行中获取最大值,则首先根据第二个条件过滤输入数据帧,然后将
max
函数应用于过滤后的数据帧:

df_contrib_to_total = df.div(df.sum(axis=1), axis=0)*100.0

contribution_threshold = 25.0
contribution_mask = df_contrib_to_total >= contribution_threshold

df_pct_diff = df[contribution_mask].pct_change()
这会在遮罩排除的任何位置提供NaN值,因此在计算遮罩时不会考虑这些值:

>>> df_pct_diff
   Type1      Type2  Type3
0    NaN        NaN    NaN
1  19.00        NaN    NaN
2   0.25  29.000000    NaN
3    NaN  -0.166667   24.0
4    NaN        NaN    0.2
>>> df_all_cols.iloc[:, 3:6].max(axis=1)
0     NaN
1    19.0
2    29.0
3    24.0
4     0.2
dtype: float64
>>> df_all_cols.iloc[:, 3:6].idxmax(axis=1)
0                 NaN
1    Type1_Pct_Change
2    Type2_Pct_Change
3    Type3_Pct_Change
4    Type3_Pct_Change
dtype: object

因此,选择第二个条件,然后计算结果数据帧视图的最大值。太好了,成功了!我知道这只是对解决方案的过度思考,而解决方案可能要简单得多。