Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何找出python数据框列(日期格式)中的差距?_Python_Pandas_Pandas Groupby_Gaps In Data - Fatal编程技术网

如何找出python数据框列(日期格式)中的差距?

如何找出python数据框列(日期格式)中的差距?,python,pandas,pandas-groupby,gaps-in-data,Python,Pandas,Pandas Groupby,Gaps In Data,我有一个数据帧,如下所示: name,year AAA,2015-11-02 22:00:00 AAA,2015-11-02 23:00:00 AAA,2015-11-03 00:00:00 AAA,2015-11-03 01:00:00 AAA,2015-11-03 02:00:00 AAA,2015-11-03 05:00:00 ZZZ,2015-09-01 00:00:00 ZZZ,2015-11-01 01:00:00 ZZZ,2015-11-01 07:00:00 ZZZ,2015-1

我有一个数据帧,如下所示:

name,year
AAA,2015-11-02 22:00:00
AAA,2015-11-02 23:00:00
AAA,2015-11-03 00:00:00
AAA,2015-11-03 01:00:00
AAA,2015-11-03 02:00:00
AAA,2015-11-03 05:00:00
ZZZ,2015-09-01 00:00:00
ZZZ,2015-11-01 01:00:00
ZZZ,2015-11-01 07:00:00
ZZZ,2015-11-01 08:00:00
ZZZ,2015-11-01 09:00:00
ZZZ,2015-11-01 12:00:00
df['year'] = pd.to_datetime(df['year'], format='%Y-%m-%d %H:%M:%S')
mask = df.groupby("name").year.diff() > pd.Timedelta('0 days 01:00:00')
我想找出dataframe的year列中关于特定名称的可用间隙。 比如说,

  • AAA名称在“2015-11-03 02:00:00”日期前有2小时的间隔
  • ZZZ名称在“2015-11-01 01:00:00”日期前有5小时的间隔
  • ZZZ名称在“2015-11-01 09:00:00”日期前有2小时的间隔
  • 我想生成两个包含内容的csv文件:

    CSV-1:

    name,year
    AAA,2015-11-02 22:00:00,0
    AAA,2015-11-02 23:00:00,0
    AAA,2015-11-03 00:00:00,0
    AAA,2015-11-03 01:00:00,0
    AAA,2015-11-03 02:00:00,2
    AAA,2015-11-03 05:00:00,0
    ZZZ,2015-09-01 00:00:00,0
    ZZZ,2015-11-01 01:00:00,5
    ZZZ,2015-11-01 07:00:00,0
    ZZZ,2015-11-01 08:00:00,0
    ZZZ,2015-11-01 09:00:00,2
    ZZZ,2015-11-01 12:00:00,0
    
    CSV-2:

    name,prev_year,next_year,gaps
    AAA,2015-11-03 02:00:00,2015-11-03 05:00:00,2015-11-03 03:00:00
    AAA,2015-11-03 02:00:00,2015-11-03 05:00:00,2015-11-03 04:00:00
    ZZZ,2015-11-01 01:00:00,2015-11-01 07:00:00,2015-11-01 02:00:00
    ZZZ,2015-11-01 01:00:00,2015-11-01 07:00:00,2015-11-01 03:00:00
    ZZZ,2015-11-01 01:00:00,2015-11-01 07:00:00,2015-11-01 04:00:00
    ZZZ,2015-11-01 01:00:00,2015-11-01 07:00:00,2015-11-01 05:00:00
    ZZZ,2015-11-01 01:00:00,2015-11-01 07:00:00,2015-11-01 06:00:00
    ZZZ,2015-11-01 09:00:00,2015-11-01 12:00:00,2015-11-01 10:00:00
    ZZZ,2015-11-01 09:00:00,2015-11-01 12:00:00,2015-11-01 11:00:00
    
    我试过如下:

    name,year
    AAA,2015-11-02 22:00:00
    AAA,2015-11-02 23:00:00
    AAA,2015-11-03 00:00:00
    AAA,2015-11-03 01:00:00
    AAA,2015-11-03 02:00:00
    AAA,2015-11-03 05:00:00
    ZZZ,2015-09-01 00:00:00
    ZZZ,2015-11-01 01:00:00
    ZZZ,2015-11-01 07:00:00
    ZZZ,2015-11-01 08:00:00
    ZZZ,2015-11-01 09:00:00
    ZZZ,2015-11-01 12:00:00
    
    df['year'] = pd.to_datetime(df['year'], format='%Y-%m-%d %H:%M:%S')
    mask = df.groupby("name").year.diff() > pd.Timedelta('0 days 01:00:00')
    

    要将间隙放入数据帧,需要重新分配生成的
    掩码。要获得总小时数,只需除以1小时:

    df['year'] = pd.to_datetime(df['year'], format='%Y-%m-%d %H:%M:%S')
    df['Gap'] = (df.groupby("name").year.diff() / pd.to_timedelta('1 hour')).fillna(0)
    
    这为我们提供了以下数据帧:

       name                year     Gap
    0   AAA 2015-11-02 22:00:00     0.0
    1   AAA 2015-11-02 23:00:00     1.0
    2   AAA 2015-11-03 00:00:00     1.0
    3   AAA 2015-11-03 01:00:00     1.0
    4   AAA 2015-11-03 02:00:00     1.0
    5   AAA 2015-11-03 05:00:00     3.0
    6   ZZZ 2015-09-01 00:00:00     0.0
    7   ZZZ 2015-11-01 07:00:00     6.0
    8   ZZZ 2015-11-01 08:00:00     1.0
    9   ZZZ 2015-11-01 09:00:00     1.0
    10  ZZZ 2015-11-01 12:00:00     3.0
    
    为了获得其开始时间旁边的间隙,并与“csv-1”所需的方式一致,我们只需将其上移一行,然后减去一行,然后再填充na值:

    df['Gap'] = ((df.groupby("name").year.diff() / pd.to_timedelta('1 hour')).shift(-1) - 1).fillna(0)
    
    这将得到:

       name                year  Gap
    0   AAA 2015-11-02 22:00:00  0.0
    1   AAA 2015-11-02 23:00:00  0.0
    2   AAA 2015-11-03 00:00:00  0.0
    3   AAA 2015-11-03 01:00:00  0.0
    4   AAA 2015-11-03 02:00:00  2.0
    5   AAA 2015-11-03 05:00:00  0.0
    6   ZZZ 2015-11-01 01:00:00  5.0
    7   ZZZ 2015-11-01 07:00:00  0.0
    8   ZZZ 2015-11-01 08:00:00  0.0
    9   ZZZ 2015-11-01 09:00:00  2.0
    10  ZZZ 2015-11-01 12:00:00  0.0
    
    为了获得您的第二个csv,我们可以执行以下操作:

    df['prev_year'] = df['year']
    df['next_year'] = df.groupby('name')['year'].shift(-1)
    
    df.set_index('year', inplace=True)
    df = df.groupby('name', as_index=False)\
           .resample(rule='1H')\
           .ffill()\
           .reset_index()
    
    gaps = df[df['year'] != df['prev_year']][['name', 'prev_year', 'next_year', 'year']]
    
    gaps.rename({'year': 'gaps'}, index='columns', inplace=True)
    
    首先,我们设置“before”和“after”列。然后,通过将索引更改为
    'year'
    ,我们可以使用
    .resample()
    方法来填充所有缺失的小时数。通过在重新采样时使用
    ffill()
    ,我们将最后一条可用记录复制到我们添加的所有新行中。我们知道当“上一年”!='在“
    ”年,我们所处的行以前不存在于框架中,因此是间隙之一,因此我们只筛选这些行,选择需要的列并重命名它们。这使得:

       name           prev_year           next_year                year
    5   AAA 2015-11-03 02:00:00 2015-11-03 05:00:00 2015-11-03 03:00:00
    6   AAA 2015-11-03 02:00:00 2015-11-03 05:00:00 2015-11-03 04:00:00
    9   ZZZ 2015-11-01 01:00:00 2015-11-01 07:00:00 2015-11-01 02:00:00
    10  ZZZ 2015-11-01 01:00:00 2015-11-01 07:00:00 2015-11-01 03:00:00
    11  ZZZ 2015-11-01 01:00:00 2015-11-01 07:00:00 2015-11-01 04:00:00
    12  ZZZ 2015-11-01 01:00:00 2015-11-01 07:00:00 2015-11-01 05:00:00
    13  ZZZ 2015-11-01 01:00:00 2015-11-01 07:00:00 2015-11-01 06:00:00
    17  ZZZ 2015-11-01 09:00:00 2015-11-01 12:00:00 2015-11-01 10:00:00
    18  ZZZ 2015-11-01 09:00:00 2015-11-01 12:00:00 2015-11-01 11:00:00
    

    总之,您的脚本可以如下所示:

    df['year'] = pd.to_datetime(df['year'], format='%Y-%m-%d %H:%M:%S')
    df['Gap'] = ((df.groupby("name").year.diff() / pd.to_timedelta('1 hour')).shift(-1) - 1).fillna(0)
    
    df.to_csv('csv-1.csv', index=False)
    
    df['prev_year'] = df['year']
    df['next_year'] = df.groupby('name')['year'].shift(-1)
    
    df.set_index('year', inplace=True)
    df = df.groupby('name', as_index=False)\
           .resample(rule='1H')\
           .ffill()\
           .reset_index()
    
    gaps = df[df['year'] != df['prev_year']][['name', 'prev_year', 'next_year', 'year']]
    
    gaps.rename({'year': 'gaps'}, index='columns', inplace=True)
    
    gaps.to_csv('csv-2.csv', index=False)
    

    @这是我所期望的,但是你能告诉我CSV-2的情况吗?