Python 创建具有多个可变条件的新数据帧,并提取失败原因

Python 创建具有多个可变条件的新数据帧,并提取失败原因,python,pandas,dataframe,Python,Pandas,Dataframe,我有下面的代码 考虑到以下限制,我希望创建如下输出: A>5,B>4,C>3 如果不符合标准,我想在dataframe中阅读下面的行,存储数据,并创建一个名为“失败原因”的新列,其中列出a、B或C是否失败 然后,我希望脚本还报告传递的数据帧行的相应值“X”、“Y”和“Z” 此后,脚本应按“组”分组,并显示每个组的最大“Hs” 我真的很难在我的数据框架中使用多个变量来实现这一点。。。任何帮助都将不胜感激 所需输出 Group Hs Fail Reason X Y Z

我有下面的代码

考虑到以下限制,我希望创建如下输出:

A>5,B>4,C>3

如果不符合标准,我想在dataframe中阅读下面的行,存储数据,并创建一个名为“失败原因”的新列,其中列出a、B或C是否失败

然后,我希望脚本还报告传递的数据帧行的相应值“X”、“Y”和“Z”

此后,脚本应按“组”分组,并显示每个组的最大“Hs”

我真的很难在我的数据框架中使用多个变量来实现这一点。。。任何帮助都将不胜感激

所需输出

   Group   Hs Fail Reason    X    Y     Z
0      1  1.0      [A, B]  0.9  1.9  0.54
1      2  0.5   [A, B, C]  0.8  2.7  0.43
主代码-我当前的尝试

import pandas as pd

data = [[1,0.5,8,8,8,0.85,1.64,0.5],
        [1,1,8,8,8,0.9,1.9,0.54],
        [1,1.5,0,0,10,1.1,2.0,0.74],
        [2,0.5,6,5,4,0.8,2.7,0.43],
        [2,1,1,1,1,0.9,2.9,0.45],
        [2,1.5,1,2,1,1.1,3.1,0.47]]

columns = ['Group', 'Hs', 'A', 'B', 'C', 'X', 'Y', 'Z']

df = pd.DataFrame(data=data, columns=columns)

Limit_A = 5
Limit_B = 4
Limit_C = 3

# Opens an empty dataframe for appending
df_new = pd.DataFrame(columns=['Group', 'Hs'])

groups = df['Group'].unique()

# for g in groups
for g in groups:
    # Create new temp dataframe
    df_1 = df[df['Group'] == g]
    # Input conditions, checks the columns one by one are NOT EQUAL TO ZERO. Outputs boolean values.
    pass_criteria = (df_1['A'] > Limit_A) & (df_1['B'] > Limit_B) & (df_1['C'] > Limit_C)

    # PASSES DATAFRAME. Locates rows where the conditions of mask_1 are SATISFIED and creates another temp dataframe.
    df_passes = df_1.loc[pass_criteria]

    # Find the max value in the dataframe e.g. the greatest operational wave height
    max_num = df_passes['Hs'].max()

    # Does the opposite of mask_1
    fail_criteria = (df_1['A'] < Limit_A) & (df_1['B'] < Limit_B) &(df_1['C'] < Limit_C)

    # FAILED DATAFRAME. Locates rows where the conditions of pass_criteria are SATISFIED and creates another temp dataframe.
    df_fails = df_1.loc[fail_criteria]

    # Uses the dataframe with FAIL and mkes the value_vars rows in the melted dataframe
    melted = pd.melt(df_fails, value_vars=['A', 'B', 'C'])

    # Pulls out the reason for fails, i.e. when the condition of the df_fail is not met. Set creates a list of unique values.
    fails = list(set(melted[melted['value'] > Limit_A]['variable']))

    # Input columns of desired outputs.
    df_e = pd.DataFrame(columns=['Group', 'Hs', 'Fail Reason'])

    # Inputs the lists as defined above.
    df_e.loc[0] = [g, max_num, fails]

    # Appends to the dataframe in a loop
    df_new = df_new.append(df_e)

print(df_new)
将熊猫作为pd导入
数据=[[1,0.5,8,8,8,0.85,1.64,0.5],
[1,1,8,8,8,0.9,1.9,0.54],
[1,1.5,0,0,10,1.1,2.0,0.74],
[2,0.5,6,5,4,0.8,2.7,0.43],
[2,1,1,1,1,0.9,2.9,0.45],
[2,1.5,1,2,1,1.1,3.1,0.47]]
列=[“组”、“Hs”、“A”、“B”、“C”、“X”、“Y”、“Z']
df=pd.DataFrame(数据=数据,列=列)
极限值A=5
极限值B=4
极限C=3
#打开一个空数据框以进行追加
df_new=pd.DataFrame(列=['Group','Hs'])
groups=df['Group'].unique()
#群中的g
对于g组:
#创建新的临时数据帧
df_1=df[df['Group']==g]
#输入条件,逐个检查列是否不等于零。输出布尔值。
通过标准=(df_1['A']>限值_A)&(df_1['B']>限值_B)&(df_1['C']>限值_C)
#传递数据帧。定位满足掩码_1条件的行,并创建另一个临时数据帧。
df_通过=df_1.loc[通过标准]
#在数据帧中找到最大值,例如最大操作波高
max_num=df_passes['Hs'].max()
#面具的反面是1吗
不合格标准=(df_1['A']Limit_A]['variable']))
#输入所需输出的列。
df_e=pd.DataFrame(列=['Group','Hs','Fail Reason'])
#输入上面定义的列表。
df_e.loc[0]=[g,max_num,失败]
#在循环中附加到数据帧
df_new=df_new.append(df_e)
打印(df_新)

IIUC首先将A、B、C列与您的限制进行比较,然后
agg
,最后
map
返回结果:

res = df[["A","B","C"]]>[5,4,3]

s = (pd.concat([df, (~res[~res.all(1)]).agg(lambda x: res.columns[x].tolist(),
                                              axis=1).rename("Fail reason")], axis=1)
       .dropna().drop_duplicates("Group").set_index("Group")["Fail reason"])

print (df.assign(failed_reason=df["Group"].map(s))
         .loc[res.all(1)].sort_values(["Group", "Hs"])
         .drop_duplicates("Group", keep="last"))

   Group   Hs  A  B  C    X    Y     Z failed_reason
1      1  1.0  8  8  8  0.9  1.9  0.54        [A, B]
3      2  0.5  6  5  4  0.8  2.7  0.43     [A, B, C]