Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按行间时间差筛选数据帧_Python_Dataframe - Fatal编程技术网

Python 按行间时间差筛选数据帧

Python 按行间时间差筛选数据帧,python,dataframe,Python,Dataframe,我正在尝试制作一个系列,让我能够过滤数据帧 看看这个例子: from pandas import DataFrame, Series if __name__ == '__main__': groups = ['G1'] * 4 + ['G2'] * 4 + ['G3'] * 4 d1 = ['2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18'] d2 = ['2019-04-15', '2019-04-17', '

我正在尝试制作一个
系列
,让我能够过滤
数据帧

看看这个例子:

from pandas import DataFrame, Series

if __name__ == '__main__':
    groups = ['G1'] * 4 + ['G2'] * 4 + ['G3'] * 4
    d1 = ['2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18']
    d2 = ['2019-04-15', '2019-04-17', '2019-04-18', '2019-04-19']
    d3 = ['2019-04-15', '2019-04-16', '2019-04-19', '2019-04-21']
    dates = d1 + d2 + d3
    data = {'group': groups, 'date': dates}
    frame = DataFrame(data=data)
    frame = frame.astype(dtype={'group': 'object', 'date': 'datetime64[ns]'})
    print(frame)
    r1 = [False, True, True, True]
    r2 = [False, False, True, True]
    r3 = [False, True, False, False]
    result = r1 + r2 + r3
    frame = frame.join(other=pd.Series(data=result).rename(index='result'))
    print(frame)
框架的第一个和第二个打印看起来像:

   group       date
0     G1 2019-04-15 # <-- First date of G1  | False
1     G1 2019-04-16 # <-- Previous date + 1 | True
2     G1 2019-04-17 # <-- Previous date + 1 | True
3     G1 2019-04-18 # <-- Previous date + 1 | True
4     G2 2019-04-15 # <-- First date of G2  | False
5     G2 2019-04-17 # <-- Previous date + 2 | False
6     G2 2019-04-18 # <-- Previous date + 1 | True
7     G2 2019-04-19 # <-- Previous date + 1 | True
8     G3 2019-04-15 # <-- First date of G3  | False
9     G3 2019-04-16 # <-- Previous date + 1 | True
10    G3 2019-04-19 # <-- Previous date + 3 | False
11    G3 2019-04-21 # <-- Previous date + 2 | False
分组日期

这是你想要的吗

import pandas as pd

if __name__ == '__main__':
    groups = ['G1'] * 4 + ['G2'] * 4 + ['G3'] * 4
    d1 = ['2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18']
    d2 = ['2019-04-15', '2019-04-17', '2019-04-18', '2019-04-19']
    d3 = ['2019-04-15', '2019-04-16', '2019-04-19', '2019-04-21']
    dates = d1 + d2 + d3
    data = {'group': groups, 'date': dates}
    frame = pd.DataFrame(data=data)
    frame = frame.astype(dtype={'group': 'object', 'date': 'datetime64[ns]'})
    print(frame)
    r1 = [False, True, True, True]
    r2 = [False, False, True, True]
    r3 = [False, True, False, False]
    result = r1 + r2 + r3
    frame = frame.join(other=pd.Series(data=result).rename(index='result'))

    # fill with default data
    frame["day_diff"] = pd.to_timedelta(arg=0, unit="days")
    # calculate diffs
    frame["day_diff"].loc[frame["group"] == frame["group"].shift(1)] = frame["date"] - frame["date"].shift(1)

    # calculate high and low day values
    low = pd.to_timedelta(arg=3, unit="days")
    high = pd.to_timedelta(arg=5, unit="days")

    # check values
    frame["good"] = (low <= frame["day_diff"]) & (frame["day_diff"] <= high)
    print(frame)
将熊猫作为pd导入
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu':
组=['G1']*4+['G2']*4+['G3']*4
d1=['2019-04-15','2019-04-16','2019-04-17','2019-04-18']
d2=['2019-04-15','2019-04-17','2019-04-18','2019-04-19']
d3=['2019-04-15','2019-04-16','2019-04-19','2019-04-21']
日期=d1+d2+d3
数据={'group':组,'date':日期}
帧=pd.DataFrame(数据=数据)
frame=frame.astype(dtype={'group':'object','date':'datetime64[ns]})
打印(帧)
r1=[假,真,真,真]
r2=[假,假,真,真]
r3=[假,真,假,假]
结果=r1+r2+r3
frame=frame.join(other=pd.Series(data=result)。重命名(index='result'))
#用默认数据填充
帧[“日差”]=pd.到时间差(arg=0,单位=“天”)
#计算差异
帧[“日期差异”].loc[帧[“组”]==帧[“组”].shift(1)]=帧[“日期”]-帧[“日期”].shift(1)
#计算日高值和日低值
低=局部放电到时间增量(arg=3,单位=“天”)
高=局部放电到时间增量(arg=5,单位=“天”)
#检查值

frame[“good”]=(低是的,这就是我正在寻找的,但我需要一个使用pandas内置函数来避免循环的解决方案,我的数据帧非常大,我应该关注性能。我删除了for循环