Python 计算符合特定标准的时间序列连续天数

Python 计算符合特定标准的时间序列连续天数,python,pandas,time-series,geospatial,Python,Pandas,Time Series,Geospatial,我有一个时空df: 'date' 'spatial_pixel' 'column_A' ... ---- ----- --- 2012-04-01 | 1000 | 5 2012-04-01 | 1001 | 1 ... ... ... 我需要一列(按“空间像素”和“日期”分组),用于计算满足布尔值的天数

我有一个时空df:

'date'        'spatial_pixel'   'column_A'   ...
 ----             -----          ---          
 2012-04-01   |   1000     |      5
 2012-04-01   |   1001     |      1
 ...              ...            ...
需要一列(按“空间像素”和“日期”分组),用于计算满足布尔值的天数。说“A列”<2:

'date'        'spatial_pixel'   'column_A'   'days-in-a-row'   ...
 ----             -----          ---           ----
 2012-03-30   |   1001     |      5    |         0
 2012-04-01   |   1001     |      1    |         1
 2012-04-02   |   1001     |      1    |         2
 2012-04-03   |   1001     |      3    |         0
 ...              ...            ...            ...
我的尝试:

首先,我创建了一个新的数据框,当布尔值为True('column_a'<2)时,将写入每月的天数(例如1,2,3,….28,29,30)。(但是,我需要它的范围为1-365,这样可以很容易地将月底和月初标识为连续的)

第二,

我尝试使用@ZJS:中修改过的代码创建一个新列来计算连续的月天数,但未成功


任何帮助都将不胜感激

下面是我对这个问题的看法:

import pandas as pd
from datetime import datetime

df = pd.DataFrame(
    [
     [datetime(2016, 1, 1), 1000, 5], 
     [datetime(2016, 1, 1), 1001, 1], 
     [datetime(2016, 1, 2), 1000, 1], 
     [datetime(2016, 1, 2), 1001, 1], 
     [datetime(2016, 1, 3), 1000, 1], 
     [datetime(2016, 1, 3), 1001, 5], 
     [datetime(2016, 1, 4), 1000, 1], 
     [datetime(2016, 1, 4), 1001, 1],
    ], 
    columns=['date', 'spatial_pixel', 'column_A']
)

df
#         date  spatial_pixel  column_A
# 0 2016-01-01           1000         5
# 1 2016-01-01           1001         1
# 2 2016-01-02           1000         1
# 3 2016-01-02           1001         1
# 4 2016-01-03           1000         1
# 5 2016-01-03           1001         5
# 6 2016-01-04           1000         1
# 7 2016-01-04           1001         1

def sum_days_in_row_with_condition(g):
    sorted_g = g.sort_values(by='date', ascending=True)
    condition = sorted_g['column_A'] < 2
    sorted_g['days-in-a-row'] = condition.cumsum() - condition.cumsum().where(~condition).ffill().astype(int)
    return sorted_g

(df.groupby('spatial_pixel')
   .apply(sum_days_in_row_with_condition)
   .reset_index(drop=True))
#         date  spatial_pixel  column_A  days-in-a-row
# 0 2016-01-01           1000         5              0
# 1 2016-01-02           1000         1              1
# 2 2016-01-03           1000         1              2
# 3 2016-01-04           1000         1              3
# 4 2016-01-01           1001         1              1
# 5 2016-01-02           1001         1              2
# 6 2016-01-03           1001         5              0
# 7 2016-01-04           1001         1              1
将熊猫作为pd导入
从日期时间导入日期时间
df=pd.DataFrame(
[
[日期时间(2016,1,1),1000,5],
[日期时间(2016,1,1),1001,1],
[日期时间(2016,1,2),1000,1],
[日期时间(2016,1,2),1001,1],
[日期时间(2016,1,3),1000,1],
[日期时间(2016,1,3),1001,5],
[日期时间(2016,1,4),1000,1],
[日期时间(2016,1,4),1001,1],
], 
columns=['date','spatial\u pixel','column\u A']
)
df
#日期空间像素列
# 0 2016-01-01           1000         5
# 1 2016-01-01           1001         1
# 2 2016-01-02           1000         1
# 3 2016-01-02           1001         1
# 4 2016-01-03           1000         1
# 5 2016-01-03           1001         5
# 6 2016-01-04           1000         1
# 7 2016-01-04           1001         1
具有条件(g)的第行定义和天数:
排序的值(按class='date',升序=真)
条件=sorted_g['column_A']<2
排序的_g['days-in-a-row']=condition.cumsum()-condition.cumsum().where(~condition.ffill().astype(int)
返回已排序的
(df.groupby(‘空间像素’)
.应用(第行中的总天数和条件)
.reset_索引(drop=True))
#日期空间像素列连续天
# 0 2016-01-01           1000         5              0
# 1 2016-01-02           1000         1              1
# 2 2016-01-03           1000         1              2
# 3 2016-01-04           1000         1              3
# 4 2016-01-01           1001         1              1
# 5 2016-01-02           1001         1              2
# 6 2016-01-03           1001         5              0
# 7 2016-01-04           1001         1              1

你能提供一个更好的版本吗,它涵盖了你提到的所有逻辑情况?编辑后的版本澄清了吗?只需进行一些小的调整就可以让它适合我(我必须删除'astype(int)才能让它适合NaN值)!我从你的代码中学到了很多。干杯
def rolling_count(val):
    if val == rolling_count.previous + 1 :
        rolling_count.count +=1
    else:
        rolling_count.previous = val
        rolling_count.count = 1
    return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable

df['count'] == df.groupby(['spatial_pixel','date'])['day'].apply(rolling_count)                             


KeyError: 'count'
import pandas as pd
from datetime import datetime

df = pd.DataFrame(
    [
     [datetime(2016, 1, 1), 1000, 5], 
     [datetime(2016, 1, 1), 1001, 1], 
     [datetime(2016, 1, 2), 1000, 1], 
     [datetime(2016, 1, 2), 1001, 1], 
     [datetime(2016, 1, 3), 1000, 1], 
     [datetime(2016, 1, 3), 1001, 5], 
     [datetime(2016, 1, 4), 1000, 1], 
     [datetime(2016, 1, 4), 1001, 1],
    ], 
    columns=['date', 'spatial_pixel', 'column_A']
)

df
#         date  spatial_pixel  column_A
# 0 2016-01-01           1000         5
# 1 2016-01-01           1001         1
# 2 2016-01-02           1000         1
# 3 2016-01-02           1001         1
# 4 2016-01-03           1000         1
# 5 2016-01-03           1001         5
# 6 2016-01-04           1000         1
# 7 2016-01-04           1001         1

def sum_days_in_row_with_condition(g):
    sorted_g = g.sort_values(by='date', ascending=True)
    condition = sorted_g['column_A'] < 2
    sorted_g['days-in-a-row'] = condition.cumsum() - condition.cumsum().where(~condition).ffill().astype(int)
    return sorted_g

(df.groupby('spatial_pixel')
   .apply(sum_days_in_row_with_condition)
   .reset_index(drop=True))
#         date  spatial_pixel  column_A  days-in-a-row
# 0 2016-01-01           1000         5              0
# 1 2016-01-02           1000         1              1
# 2 2016-01-03           1000         1              2
# 3 2016-01-04           1000         1              3
# 4 2016-01-01           1001         1              1
# 5 2016-01-02           1001         1              2
# 6 2016-01-03           1001         5              0
# 7 2016-01-04           1001         1              1