Python 基于轧制条件的分组
我试图根据一些条件对数据帧进行分组 数据帧:Python 基于轧制条件的分组,python,pandas,dataframe,group-by,Python,Pandas,Dataframe,Group By,我试图根据一些条件对数据帧进行分组 数据帧: Start Date End Date value 1971-07-01 1971-07-31 0.0 1971-08-01 1971-08-31 0.25 1971-09-01 1971-09-30 -0.62 1971-10-01 1971-10-31 0.0 1971-11-01 1971-11-30 -0.63 1971-12-01 1971-12-31 -1.0 1972-01-01 1972-01-31
Start Date End Date value
1971-07-01 1971-07-31 0.0
1971-08-01 1971-08-31 0.25
1971-09-01 1971-09-30 -0.62
1971-10-01 1971-10-31 0.0
1971-11-01 1971-11-30 -0.63
1971-12-01 1971-12-31 -1.0
1972-01-01 1972-01-31 0.0
1972-02-01 1972-02-29 0.0
1972-03-01 1972-03-31 2.0
1972-04-01 1972-04-30 0.0
.
.
1973-07-01 1973-07-31 2.0
1973-08-01 1973-08-31 0.5
1973-09-01 1973-09-30 -2.0
1973-10-01 1973-10-31 0.0
1973-11-01 1973-11-30 0.0
1973-12-01 1973-12-31 0.0
1974-01-01 1974-01-31 0.0
1974-02-01 1974-02-28 0.0
.
.
.
1974-11-01 1974-11-30 0.0
1974-12-01 1974-12-31 -1.25
1975-01-01 1975-01-31 -1.0
1975-02-01 1975-02-28 -1.0
1975-03-01 1975-03-31 -0.5
1975-04-01 1975-04-30 -0.25
1975-05-01 1975-05-31 0.0
1975-06-01 1975-06-30 1.25
1975-07-01 1975-07-31 0.0
1975-08-01 1975-08-31 0.0
分组标准
小组应始终以负值开始
只要我们的价值为负值,团队就会继续
如果我们达到一个正值或三个连续零,则组结束
来自上述数据帧的示例1
1971-09-01 1971-09-30 -0.62
1971-10-01 1971-10-31 0.0
1971-11-01 1971-11-30 -0.63
1971-12-01 1971-12-31 -1.0
1972-01-01 1972-01-31 0.0
1972-02-01 1972-02-29 0.0
示例2(在本例中,我们达到了3个连续零)
示例3(在本例中,我们得到了一个正值)
我还没有得到任何代码,因为我仍在研究如何将条件放入groupby或任何其他有效的方法中
我试过打圈,但我哪儿也不去
df.index中的i的:
否=0
如果df['Value'][i]<0:
df['groupno']=否
分组后,我想获得组的第一列的开始日期和组的最后一列的结束日期
预期结果(来自示例):
谢谢你的阅读。我认为这不是python式的方法,但它很有效,我认为它可以对你有所帮助
groups = []
start = '' # start date for group
end = '' # end date for group
nulls = 0 # count of nulls
for j,i in df.iterrows():
# if it's first negativa value - start the group
if i.value < 0 and start == '':
start = i['Start Date']
nulls = 0
# if it's null - remember that
if i.value == 0:
nulls += 1
else:
nulls = 0
# if value > 0 or we have seen 3 nulls - end group (if it was start)
if ( (i.value > 0) or (nulls == 3) ) and start != '':
# if we have seen 3 nulls - we want write this end date (not previous)
if nulls == 3:
end = i['End Date']
groups.append((start, end))
start = ''
nulls = 0
if nulls == 3:
start = ''
nulls = 0
# remember previous end date
end = i['End Date']
result = pd.DataFrame(groups, columns = ['Start Date', 'End Date'])
print(result)
谢谢你的回答
1974-12-01 1974-12-31 -1.25
1975-01-01 1975-01-31 -1.0
1975-02-01 1975-02-28 -1.0
1975-03-01 1975-03-31 -0.5
1975-04-01 1975-04-30 -0.25
1975-05-01 1975-05-31 0.0
Start Date End Date
1971-09-01 1972-02-29
1973-09-01 1973-12-31
1974-12-01 1975-05-31
groups = []
start = '' # start date for group
end = '' # end date for group
nulls = 0 # count of nulls
for j,i in df.iterrows():
# if it's first negativa value - start the group
if i.value < 0 and start == '':
start = i['Start Date']
nulls = 0
# if it's null - remember that
if i.value == 0:
nulls += 1
else:
nulls = 0
# if value > 0 or we have seen 3 nulls - end group (if it was start)
if ( (i.value > 0) or (nulls == 3) ) and start != '':
# if we have seen 3 nulls - we want write this end date (not previous)
if nulls == 3:
end = i['End Date']
groups.append((start, end))
start = ''
nulls = 0
if nulls == 3:
start = ''
nulls = 0
# remember previous end date
end = i['End Date']
result = pd.DataFrame(groups, columns = ['Start Date', 'End Date'])
print(result)
Start Date End Date
0 1971-09-01 1972-02-29
1 1973-09-01 1973-12-31
2 1974-12-01 1975-05-31