Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas-使用逻辑过滤数据帧_Python_Pandas - Fatal编程技术网

Python Pandas-使用逻辑过滤数据帧

Python Pandas-使用逻辑过滤数据帧,python,pandas,Python,Pandas,我有一个像这样的熊猫数据框 Employee ID ActionCode ActionReason ConcatenatedOutput 1 TER DEA TER_DEA 1 RET ABC RET_ABC 1 RET

我有一个像这样的熊猫数据框

Employee ID     ActionCode     ActionReason      ConcatenatedOutput
1                  TER              DEA                TER_DEA                
1                  RET              ABC                RET_ABC
1                  RET              DEF                RET_DEF
2                  TER              DEA                TER_DEA
2                  ABC              ABC                ABC_ABC
2                  DEF              DEF                DEF_DEF
3                  RET              FGH                RET_FGH
3                  RET              EFG                RET_EFG
4                  PLA              ABC                PLA_ABC
4                  TER              DEA                TER_DEA                
Employee ID          ConcatenatedOutput       Context
1                     RET_ABC                 RET or TER Found
2                     TER_DEA                 RET or TER Found
3                     RET_FGH                 RET or TER Found
4                     PLA_ABC                 RET or TER Not Found
我想用下面的逻辑对它进行过滤,然后把它改成这样

Employee ID     ActionCode     ActionReason      ConcatenatedOutput
1                  TER              DEA                TER_DEA                
1                  RET              ABC                RET_ABC
1                  RET              DEF                RET_DEF
2                  TER              DEA                TER_DEA
2                  ABC              ABC                ABC_ABC
2                  DEF              DEF                DEF_DEF
3                  RET              FGH                RET_FGH
3                  RET              EFG                RET_EFG
4                  PLA              ABC                PLA_ABC
4                  TER              DEA                TER_DEA                
Employee ID          ConcatenatedOutput       Context
1                     RET_ABC                 RET or TER Found
2                     TER_DEA                 RET or TER Found
3                     RET_FGH                 RET or TER Found
4                     PLA_ABC                 RET or TER Not Found
逻辑:- 1如果员工的第一条记录是TERU DEA,那么我们进入该员工,查看该员工是否有任何其他记录,如果该员工有另一条RET记录,那么我们选择第一条可用的RET记录,否则我们坚持TERU DEA记录

2如果员工的第一条记录不是Teru DEA,那么我们坚持该记录

3上下文是有条件的,如果它有一个RET或TER,那么我们说RET或TER已找到,否则它未找到

注意:-对于员工ID,最终输出只有一条记录

下面的数据

employee_id = [1,1,1,2,2,2,3,3,4,4]
action_code = ['TER','RET','RET','TER','ABC','DEF','RET','RET','PLA','TER']
action_reason = ['DEA','ABC','DEF','DEA','ABC','DEF','FGH','EFG','ABC','DEA']
concatenated_output = ['TER_DEA', 'RET_ABC', 'RET_DEF', 'TER_DEA', 'ABC_ABC', 'DEF_DEF', 'RET_FGH', 'RET_EFG', 'PLA_ABC', 'TER_DEA']

df = pd.DataFrame({
    'Employee ID': employee_id,
    'ActionCode': action_code,
    'ActionReason': action_reason,
    'ConcatenatedOutput': concatenated_output,
})

我建议你在那一领域与一个笨蛋合作。为了获得测试数据,我使用了以下方法:

import pandas as pd

employee_id = [1,1,1,2,2,2,3,3,4,4]
action_code = ['TER','RET','RET','TER','ABC','DEF','RET','RET','PLA','TER']
action_reason = ['DEA','ABC','DEF','DEA','ABC','DEF','FGH','EFG','ABC','DEA']
concatenated_output = ['TER_DEA', 'RET_ABC', 'RET_DEF', 'TER_DEA', 'ABC_ABC', 'DEF_DEF', 'RET_FGH', 'RET_EFG', 'PLA_ABC', 'TER_DEA']

df = pd.DataFrame({
    'Employee ID': employee_id,
    'ActionCode': action_code,
    'ActionReason': action_reason,
    'ConcatenatedOutput': concatenated_output,
})
然后,您可以根据员工ID进行分组,并在其中应用一个函数来执行特定的程序逻辑

def myfunc(data):
    if data.iloc[0]['ConcatenatedOutput'] == 'TER_DEA':
        if len(data.loc[data['ActionCode'] == 'RET']) > 0:
            located_record = data.loc[data['ActionCode'] == 'RET'].iloc[[0]]
        else:
            located_record = data.iloc[[0]]
    else:
        located_record = data.iloc[[0]]
    located_record['RET or TER Context'] = data['ActionCode'].str.contains('|'.join(['RET', 'TER']))
    return located_record

df.groupby(['Employee ID']).apply(myfunc)

同样,这里的问题不是条件列,而是基于LogicsHanks消除重复记录。但是你能帮我做这个功能吗。我不确定,如何更改函数以查找TERU DEA,然后查找下一个RET记录。正在忙,1秒钟。正在检查。请给我10分钟。谢谢你的帮助: