Python 什么';在这种情况下,应用/循环的最佳方法是什么?
我正在转换一些申请者事务数据,我需要创建一个新的标志列(在我的示例中标记为“所需标志”)。然而,我无法找出正确的循环/应用方法,因为下面的逻辑中可能有很多不同的变化 在一个完美的世界中,连续的申请人流程历史将如下所示,所有“状态”都设置为“已完成”:Python 什么';在这种情况下,应用/循环的最佳方法是什么?,python,python-3.x,loops,pandas,Python,Python 3.x,Loops,Pandas,我正在转换一些申请者事务数据,我需要创建一个新的标志列(在我的示例中标记为“所需标志”)。然而,我无法找出正确的循环/应用方法,因为下面的逻辑中可能有很多不同的变化 在一个完美的世界中,连续的申请人流程历史将如下所示,所有“状态”都设置为“已完成”: 现场面试开始-->安排面试-->决策;或 电话面试开始-->安排面试-->决策 当然,在申请过程中,申请人可以通过许多电话面试和现场面试 如下面的示例所示,有时会取消“计划面试”。在这些情况下,我需要删除该步骤以及与之相关的后续步骤。这些包括“
- 现场面试开始-->安排面试-->决策;或
- 电话面试开始-->安排面试-->决策
import pandas as pd
data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))
df
我认为下面的代码解决了您的问题
import pandas as pd
data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))
index_list_delete = []
start_deleting = False
for i in range(0, len(df)):
if start_deleting == False:
# whenever I see a "CANCELED", i know some following rows need to be deleted
if df.iloc[i]['Event Status'] == 'CANCELED':
index_list_delete += [i]
start_deleting = True
else:
# whenever i see a "Schedule Interviews", i need to stop deleting.
# otherwise keep track of the rows that need to be deleted
if df.iloc[i]['Event'] == 'Schedule Interviews':
start_deleting = False
else:
index_list_delete += [i]
# deleting rows
df = df.drop(df.index[index_list_delete])
# reseting index
df = df.reset_index(drop = True)
您将得到以下结果
Employee ID Completed On Date Event Event Status DESIRED FLAG
0 100 2009-01-01 Decision Completed Keep
1 100 2010-01-01 On-Site Interview Kick Off Completed Keep
2 100 2014-01-01 Schedule Interviews Completed Keep
3 100 2015-01-01 Decision Completed Keep
4 100 2016-01-01 Phone Interview Kick Off Completed Keep
5 100 2017-01-01 Schedule Interviews Completed Keep
6 100 2018-01-01 Decision Completed Keep
7 200 2010-01-01 On-Site Interview Kick Off Completed Keep
8 200 2014-01-01 Schedule Interviews Completed Keep
9 200 2015-01-01 Decision Completed Keep
10 300 2009-01-01 Job Apply Completed Keep
11 300 2010-01-01 Phone Interview Kick Off Completed Keep
12 300 2014-01-01 Schedule Interviews Completed Keep
13 300 2015-01-01 Decision Completed Keep
我认为下面的代码解决了您的问题
import pandas as pd
data = {'Employee ID': ["100","100", "100", "100","100","100","100","100","100","100","200", "200", "200","200","200","200","200","300","300", "300", "300","300","300","300"],
'Completed On Date': ["2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01","2016-01-01","2017-01-01","2018-01-01","2010-01-01","2011-06-05","2012-07-01","2012-08-15","2013-01-01","2014-01-01","2015-01-01","2009-01-01","2010-01-01","2011-06-05","2012-07-01","2013-01-01","2014-01-01","2015-01-01"],
'Event': ["Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision","Decision","Phone Interview Kick Off","Schedule Interviews","Decision","Job Apply","Phone Interview Kick Off","Schedule Interviews","Decision","On-Site Interview Kick Off","Schedule Interviews","Decision"],
'Event Status': ["Completed","Completed","CANCELED","Completed","Completed","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Manually Skipped","Completed","Completed","Completed","Completed","Completed","Completed","CANCELED","Completed","Completed","Completed","Completed"],
'DESIRED FLAG': ["Keep","Keep","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Remove","Keep","Keep","Keep","Keep","Remove","Remove","Remove","Keep","Keep"]}
df = pd.DataFrame(data, columns=['Employee ID','Completed On Date','Event','Event Status','DESIRED FLAG'])
df = df.sort_values(by=(['Employee ID','Completed On Date']))
index_list_delete = []
start_deleting = False
for i in range(0, len(df)):
if start_deleting == False:
# whenever I see a "CANCELED", i know some following rows need to be deleted
if df.iloc[i]['Event Status'] == 'CANCELED':
index_list_delete += [i]
start_deleting = True
else:
# whenever i see a "Schedule Interviews", i need to stop deleting.
# otherwise keep track of the rows that need to be deleted
if df.iloc[i]['Event'] == 'Schedule Interviews':
start_deleting = False
else:
index_list_delete += [i]
# deleting rows
df = df.drop(df.index[index_list_delete])
# reseting index
df = df.reset_index(drop = True)
您将得到以下结果
Employee ID Completed On Date Event Event Status DESIRED FLAG
0 100 2009-01-01 Decision Completed Keep
1 100 2010-01-01 On-Site Interview Kick Off Completed Keep
2 100 2014-01-01 Schedule Interviews Completed Keep
3 100 2015-01-01 Decision Completed Keep
4 100 2016-01-01 Phone Interview Kick Off Completed Keep
5 100 2017-01-01 Schedule Interviews Completed Keep
6 100 2018-01-01 Decision Completed Keep
7 200 2010-01-01 On-Site Interview Kick Off Completed Keep
8 200 2014-01-01 Schedule Interviews Completed Keep
9 200 2015-01-01 Decision Completed Keep
10 300 2009-01-01 Job Apply Completed Keep
11 300 2010-01-01 Phone Interview Kick Off Completed Keep
12 300 2014-01-01 Schedule Interviews Completed Keep
13 300 2015-01-01 Decision Completed Keep
如果您可以发布所需输出的样子,这将非常有帮助。请参阅“所需标志”列。这就是输出的样子。谢谢!明白了。以数据帧的形式将其可视化会有帮助,但可能这只是me.Np。我从未想过如何在这个论坛中输出DF!:如果您可以发布,这将非常有帮助期望的输出是什么样子的。请参阅“期望的标志”列。这就是输出应该是什么样子的。谢谢!明白了。以数据帧的形式将其可视化很有帮助,但这可能只是我。Np。我从来没有在这个论坛中弄明白如何输出DF!:OI使用真实数据进行了一些额外的测试,这种逻辑并没有限制它的功能f到员工ID…它应该只在每个相应的员工ID集内执行您的解决方案。下面是一个不雅观的部分解决方案。在后续步骤中,我仍然必须筛选出最后一步是安排面试团队的解决方案…如果(df.iloc[I]['Event Status']='cancelled')和(df.iloc[I]['Employee ID']==df.iloc[i+1]['Employee ID']):我用真实数据做了一些额外的测试,这个逻辑并不局限于员工ID。它应该只在每个员工ID集中执行您的解决方案。下面是一个不雅观的部分解决方案。在接下来的步骤中,我仍然必须筛选出最后一步是安排面试团队的解决方案。。。。如果(df.iloc[i]['Event Status']=='cancelled')和(df.iloc[i]['Employee ID']==df.iloc[i+1]['Employee ID']):