Python 基于轧制条件的分组

Python 基于轧制条件的分组,python,pandas,dataframe,group-by,Python,Pandas,Dataframe,Group By,我试图根据一些条件对数据帧进行分组 数据帧: Start Date End Date value 1971-07-01 1971-07-31 0.0 1971-08-01 1971-08-31 0.25 1971-09-01 1971-09-30 -0.62 1971-10-01 1971-10-31 0.0 1971-11-01 1971-11-30 -0.63 1971-12-01 1971-12-31 -1.0 1972-01-01 1972-01-31

我试图根据一些条件对数据帧进行分组

数据帧:

Start Date  End Date    value
1971-07-01  1971-07-31  0.0
1971-08-01  1971-08-31  0.25
1971-09-01  1971-09-30  -0.62
1971-10-01  1971-10-31  0.0
1971-11-01  1971-11-30  -0.63
1971-12-01  1971-12-31  -1.0
1972-01-01  1972-01-31  0.0
1972-02-01  1972-02-29  0.0
1972-03-01  1972-03-31  2.0
1972-04-01  1972-04-30  0.0
.
.
1973-07-01  1973-07-31  2.0
1973-08-01  1973-08-31  0.5
1973-09-01  1973-09-30  -2.0
1973-10-01  1973-10-31  0.0
1973-11-01  1973-11-30  0.0
1973-12-01  1973-12-31  0.0
1974-01-01  1974-01-31  0.0
1974-02-01  1974-02-28  0.0
.
.
.
1974-11-01  1974-11-30  0.0
1974-12-01  1974-12-31  -1.25
1975-01-01  1975-01-31  -1.0
1975-02-01  1975-02-28  -1.0
1975-03-01  1975-03-31  -0.5
1975-04-01  1975-04-30  -0.25
1975-05-01  1975-05-31  0.0
1975-06-01  1975-06-30  1.25
1975-07-01  1975-07-31  0.0
1975-08-01  1975-08-31  0.0
分组标准

小组应始终以负值开始

只要我们的价值为负值,团队就会继续

如果我们达到一个正值三个连续零,则组结束

来自上述数据帧的示例1

1971-09-01  1971-09-30  -0.62
1971-10-01  1971-10-31  0.0
1971-11-01  1971-11-30  -0.63
1971-12-01  1971-12-31  -1.0
1972-01-01  1972-01-31  0.0
1972-02-01  1972-02-29  0.0
示例2(在本例中,我们达到了3个连续零)

示例3(在本例中,我们得到了一个正值)

我还没有得到任何代码,因为我仍在研究如何将条件放入groupby或任何其他有效的方法中

我试过打圈,但我哪儿也不去

df.index中的i的
:
否=0
如果df['Value'][i]<0:
df['groupno']=否
分组后,我想获得组的第一列的开始日期和组的最后一列的结束日期

预期结果(来自示例):


谢谢你的阅读。

我认为这不是python式的方法,但它很有效,我认为它可以对你有所帮助

groups = []
start = '' # start date for group
end = '' # end date for group
nulls = 0 # count of nulls
for j,i in df.iterrows():
    # if it's first negativa value - start the group
    if i.value < 0 and start == '':
        start = i['Start Date']
        nulls = 0
    # if it's null - remember that
    if i.value == 0:
        nulls += 1
    else:
        nulls = 0
    # if value > 0 or we have seen 3 nulls - end group (if it was start)
    if ( (i.value > 0) or (nulls == 3) ) and start != '':
        # if we have seen 3 nulls - we want write this end date (not previous)
        if nulls == 3:
            end = i['End Date']
        groups.append((start, end))
        start = ''
        nulls = 0
    if nulls == 3:
        start = ''
        nulls = 0
    # remember previous end date
    end = i['End Date']
result = pd.DataFrame(groups, columns = ['Start Date', 'End Date'])
print(result)

谢谢你的回答
1974-12-01  1974-12-31  -1.25
1975-01-01  1975-01-31  -1.0
1975-02-01  1975-02-28  -1.0
1975-03-01  1975-03-31  -0.5
1975-04-01  1975-04-30  -0.25
1975-05-01  1975-05-31  0.0
Start Date   End Date
1971-09-01   1972-02-29
1973-09-01   1973-12-31
1974-12-01   1975-05-31
groups = []
start = '' # start date for group
end = '' # end date for group
nulls = 0 # count of nulls
for j,i in df.iterrows():
    # if it's first negativa value - start the group
    if i.value < 0 and start == '':
        start = i['Start Date']
        nulls = 0
    # if it's null - remember that
    if i.value == 0:
        nulls += 1
    else:
        nulls = 0
    # if value > 0 or we have seen 3 nulls - end group (if it was start)
    if ( (i.value > 0) or (nulls == 3) ) and start != '':
        # if we have seen 3 nulls - we want write this end date (not previous)
        if nulls == 3:
            end = i['End Date']
        groups.append((start, end))
        start = ''
        nulls = 0
    if nulls == 3:
        start = ''
        nulls = 0
    # remember previous end date
    end = i['End Date']
result = pd.DataFrame(groups, columns = ['Start Date', 'End Date'])
print(result)
   Start Date    End Date
0  1971-09-01  1972-02-29
1  1973-09-01  1973-12-31
2  1974-12-01  1975-05-31