Python 如何检索数据帧中特定时间段的开始日期和结束日期？_Python_Pandas

Python 如何检索数据帧中特定时间段的开始日期和结束日期？

python pandas

Python 如何检索数据帧中特定时间段的开始日期和结束日期？,python,pandas,Python,Pandas,完全可复制的数据帧和问题末尾的先前尝试。一个类似的问题已经被提出并得到了回答我有一个带有时间戳dates和值a的数据帧，它们是[-10,10]范围内的整数。在上面名为的列中我确定了a>0的时间段。现在我想检索这些时期的所有开始和结束资料我已经能够使用一种极其繁琐且可能不稳定的方法来实现这一点，即在pch列中使用for循环并识别周期变化。为了使索引for循环中的行更容易，我将同一列移到了名为per的新列中。运行下面的代码段将为med提供所需的输出，即： [[Timestamp('2020

完全可复制的数据帧和问题末尾的先前尝试。一个类似的问题已经被提出并得到了回答

我有一个带有时间戳

dates

和值

的数据帧，它们是

[-10,10]

范围内的整数。在上面名为

的列中

我确定了

a>0

的时间段。现在我想检索这些时期的所有开始和结束

资料我已经能够使用一种极其繁琐且可能不稳定的方法来实现这一点，即在

pch

列中使用for循环并识别周期变化。为了使索引for循环中的行更容易，我将同一列移到了名为

per

的新列中。运行下面的代码段将为med提供所需的输出，即：

[[Timestamp('2020-01-06 00:00:00'), Timestamp('2020-01-10 00:00:00')],
 [Timestamp('2020-01-14 00:00:00'), Timestamp('2020-01-15 00:00:00')]]

但正如你将看到的那样，这个过程远非优雅。因此，如果你们中的任何一位熊猫专业人士对如何做到这一点有任何其他建议，那将是非常棒的

完整代码：

将熊猫作为pd导入
从导入时间戳
df=pd.DataFrame（{'dates'：{0:Timestamp（'2020-01-01 00:00:00'），
1：时间戳（'2020-01-02 00:00:00'），
2：时间戳（'2020-01-03 00:00:00'），
3：时间戳（'2020-01-06 00:00:00'），
4：时间戳（'2020-01-07 00:00:00'），
5：时间戳（'2020-01-08 00:00:00'），
6：时间戳（'2020-01-09 00:00:00'），
7：时间戳（'2020-01-10 00:00:00'），
8：时间戳（'2020-01-13 00:00:00'），
9：时间戳（'2020-01-14 00:00:00'），
10：时间戳（'2020-01-15 00:00:00'），
‘a’：{0:0，1:7，2:9，3:1，4:1，5:2，6:1，7:1，8:2，9:8，10:10}，
'上面'：{0:0,1:0,2:0,3:1,4:1,5:1,6:1,7:1,8:0,9:1,10:1}，
'pch'：{0:0.0，
1: 0.0,
2: 0.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: -1.0,
9: 1.0,
10: 0.0},
'每'：{0:0.0，
1: 0.0,
2: 1.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: -1.0,
8: 1.0,
9: 0.0,
10: 0.0}})
#提取脚气开始和结束
#容器
p_s=[]
p_e=[]
#查找周期的开始，其中
#df['per']=1的前一行，以及
#df['per']=-1的当前行
对于枚举中的i，p（df['a'][1:]，1）：
#打印（df['a'].iat[i-1]）
如果df['per'].iat[i-1]==1：
#打印（df['dates'].iat[i]）
p_.s.追加（df['dates'].iat[i]）
如果df['per'].iat[i]=-1：
p_e.追加（df['dates'].iat[i]）
#每个时期都应该有开始和结束。
#因此，如果开始多于结束，最后一个
#可用日期附在p_3之后
如果len（p_e）

您可以使用反向掩码通过累积和创建连续组，并仅将每个组的第一个和最后一个值聚合到较大的值传递到筛选行：

m = df['a'].gt(0)

df1 = df[m].groupby((~m).cumsum())['dates'].agg(['first','last'])
print (df1)
       first       last
a                      
3 2020-01-06 2020-01-10
4 2020-01-14 2020-01-15

import pandas as pd
from pandas import Timestamp


df = pd.DataFrame({'dates': {0: Timestamp('2020-01-01 00:00:00'),
          1: Timestamp('2020-01-02 00:00:00'),
          2: Timestamp('2020-01-03 00:00:00'),
          3: Timestamp('2020-01-06 00:00:00'),
          4: Timestamp('2020-01-07 00:00:00'),
          5: Timestamp('2020-01-08 00:00:00'),
          6: Timestamp('2020-01-09 00:00:00'),
          7: Timestamp('2020-01-10 00:00:00'),
          8: Timestamp('2020-01-13 00:00:00'),
          9: Timestamp('2020-01-14 00:00:00'),
          10: Timestamp('2020-01-15 00:00:00')},
         'a': {0: 0, 1: -7, 2: -9, 3: 1, 4: 1, 5: 2, 6: 1, 7: 1, 8: -2, 9: 8, 10: 10},
         'above': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 0, 9: 1, 10: 1},
         'pch': {0: 0.0,
          1: 0.0,
          2: 0.0,
          3: 1.0,
          4: 0.0,
          5: 0.0,
          6: 0.0,
          7: 0.0,
          8: -1.0,
          9: 1.0,
          10: 0.0},
         'per': {0: 0.0,
          1: 0.0,
          2: 1.0,
          3: 0.0,
          4: 0.0,
          5: 0.0,
          6: 0.0,
          7: -1.0,
          8: 1.0,
          9: 0.0,
          10: 0.0}})

# extract beriod starts and ends

# containers
p_s = []
p_e = []

# find beginning of periods where
# previous row for df['per']  = 1, and
# current row for df['per']  =  -1 
for i, p in enumerate(df['a'][1:], 1):
    #print(df['a'].iat[i-1])
    if df['per'].iat[i-1]==1:
        #print(df['dates'].iat[i])
        p_s.append(df['dates'].iat[i])
    if df['per'].iat[i]==-1:
        p_e.append(df['dates'].iat[i])

# every period should have a beginning and and end.
# so if there are more starts than ends, the last
# date available is appended to p_3
if len(p_e) < len(p_s):
    p_e.append(df['dates'].iat[-1])

# transform a list of starts and a list of ends
# into as list of starts and ends
p_corrected = []
for i, p in enumerate(p_s):
    #print(p_s[i])
    new_elem = [p_s[i], p_e[i]]
    p_corrected.append(new_elem)

print(p_corrected)

m = df['a'].gt(0)

df1 = df[m].groupby((~m).cumsum())['dates'].agg(['first','last'])
print (df1)
       first       last
a                      
3 2020-01-06 2020-01-10
4 2020-01-14 2020-01-15

L = df1.apply(list, axis=1).tolist()
print (L)
[[Timestamp('2020-01-06 00:00:00'), Timestamp('2020-01-10 00:00:00')], 
 [Timestamp('2020-01-14 00:00:00'), Timestamp('2020-01-15 00:00:00')]]