Python 计算符合特定标准的时间序列连续天数_Python_Pandas_Time Series_Geospatial

Python 计算符合特定标准的时间序列连续天数

python pandas

Python 计算符合特定标准的时间序列连续天数,python,pandas,time-series,geospatial,Python,Pandas,Time Series,Geospatial,我有一个时空df： 'date' 'spatial_pixel' 'column_A' ... ---- ----- --- 2012-04-01 | 1000 | 5 2012-04-01 | 1001 | 1 ... ... ... 我需要一列（按“空间像素”和“日期”分组），用于计算满足布尔值的天数

我有一个时空df：

'date'        'spatial_pixel'   'column_A'   ...
 ----             -----          ---          
 2012-04-01   |   1000     |      5
 2012-04-01   |   1001     |      1
 ...              ...            ...

我需要一列（按“空间像素”和“日期”分组），用于计算满足布尔值的天数。说“A列”<2：

'date' 'spatial_pixel' 'column_A' 'days-in-a-row' ... ---- ----- --- ---- 2012-03-30 | 1001 | 5 | 0 2012-04-01 | 1001 | 1 | 1 2012-04-02 | 1001 | 1 | 2 2012-04-03 | 1001 | 3 | 0 ... ... ... ...
我的尝试：
首先，我创建了一个新的数据框，当布尔值为True（'column_a'<2）时，将写入每月的天数（例如1,2,3，….28,29,30）。（但是，我需要它的范围为1-365，这样可以很容易地将月底和月初标识为连续的）
第二,
我尝试使用@ZJS:中修改过的代码创建一个新列来计算连续的月天数，但未成功

任何帮助都将不胜感激
下面是我对这个问题的看法：

import pandas as pd from datetime import datetime df = pd.DataFrame( [ [datetime(2016, 1, 1), 1000, 5], [datetime(2016, 1, 1), 1001, 1], [datetime(2016, 1, 2), 1000, 1], [datetime(2016, 1, 2), 1001, 1], [datetime(2016, 1, 3), 1000, 1], [datetime(2016, 1, 3), 1001, 5], [datetime(2016, 1, 4), 1000, 1], [datetime(2016, 1, 4), 1001, 1], ], columns=['date', 'spatial_pixel', 'column_A'] ) df # date spatial_pixel column_A # 0 2016-01-01 1000 5 # 1 2016-01-01 1001 1 # 2 2016-01-02 1000 1 # 3 2016-01-02 1001 1 # 4 2016-01-03 1000 1 # 5 2016-01-03 1001 5 # 6 2016-01-04 1000 1 # 7 2016-01-04 1001 1 def sum_days_in_row_with_condition(g): sorted_g = g.sort_values(by='date', ascending=True) condition = sorted_g['column_A'] < 2 sorted_g['days-in-a-row'] = condition.cumsum() - condition.cumsum().where(~condition).ffill().astype(int) return sorted_g (df.groupby('spatial_pixel') .apply(sum_days_in_row_with_condition) .reset_index(drop=True)) # date spatial_pixel column_A days-in-a-row # 0 2016-01-01 1000 5 0 # 1 2016-01-02 1000 1 1 # 2 2016-01-03 1000 1 2 # 3 2016-01-04 1000 1 3 # 4 2016-01-01 1001 1 1 # 5 2016-01-02 1001 1 2 # 6 2016-01-03 1001 5 0 # 7 2016-01-04 1001 1 1

将熊猫作为pd导入从日期时间导入日期时间 df=pd.DataFrame( [ [日期时间（2016,1,1），1000,5]， [日期时间（2016,1,1），1001,1]， [日期时间（2016,1,2），1000,1]， [日期时间（2016,1,2），1001,1]， [日期时间（2016,1,3），1000,1]， [日期时间（2016,1,3），1001,5]， [日期时间（2016,1,4），1000,1]， [日期时间（2016,1,4），1001,1]， ], columns=['date'，'spatial\u pixel'，'column\u A'] ) df #日期空间像素列 # 0 2016-01-01 1000 5 # 1 2016-01-01 1001 1 # 2 2016-01-02 1000 1 # 3 2016-01-02 1001 1 # 4 2016-01-03 1000 1 # 5 2016-01-03 1001 5 # 6 2016-01-04 1000 1 # 7 2016-01-04 1001 1 具有条件（g）的第行定义和天数：排序的值（按class='date'，升序=真）条件=sorted_g['column_A']<2 排序的_g['days-in-a-row']=condition.cumsum（）-condition.cumsum（）.where（~condition.ffill（）.astype（int）返回已排序的（df.groupby（‘空间像素’） .应用（第行中的总天数和条件） .reset_索引（drop=True）） #日期空间像素列连续天 # 0 2016-01-01 1000 5 0 # 1 2016-01-02 1000 1 1 # 2 2016-01-03 1000 1 2 # 3 2016-01-04 1000 1 3 # 4 2016-01-01 1001 1 1 # 5 2016-01-02 1001 1 2 # 6 2016-01-03 1001 5 0 # 7 2016-01-04 1001 1 1
你能提供一个更好的版本吗，它涵盖了你提到的所有逻辑情况？编辑后的版本澄清了吗？只需进行一些小的调整就可以让它适合我（我必须删除'astype（int）才能让它适合NaN值）！我从你的代码中学到了很多。干杯
def rolling_count(val): if val == rolling_count.previous + 1 : rolling_count.count +=1 else: rolling_count.previous = val rolling_count.count = 1 return rolling_count.count rolling_count.count = 0 #static variable rolling_count.previous = None #static variable df['count'] == df.groupby(['spatial_pixel','date'])['day'].apply(rolling_count) KeyError: 'count'

import pandas as pd from datetime import datetime df = pd.DataFrame( [ [datetime(2016, 1, 1), 1000, 5], [datetime(2016, 1, 1), 1001, 1], [datetime(2016, 1, 2), 1000, 1], [datetime(2016, 1, 2), 1001, 1], [datetime(2016, 1, 3), 1000, 1], [datetime(2016, 1, 3), 1001, 5], [datetime(2016, 1, 4), 1000, 1], [datetime(2016, 1, 4), 1001, 1], ], columns=['date', 'spatial_pixel', 'column_A'] ) df # date spatial_pixel column_A # 0 2016-01-01 1000 5 # 1 2016-01-01 1001 1 # 2 2016-01-02 1000 1 # 3 2016-01-02 1001 1 # 4 2016-01-03 1000 1 # 5 2016-01-03 1001 5 # 6 2016-01-04 1000 1 # 7 2016-01-04 1001 1 def sum_days_in_row_with_condition(g): sorted_g = g.sort_values(by='date', ascending=True) condition = sorted_g['column_A'] < 2 sorted_g['days-in-a-row'] = condition.cumsum() - condition.cumsum().where(~condition).ffill().astype(int) return sorted_g (df.groupby('spatial_pixel') .apply(sum_days_in_row_with_condition) .reset_index(drop=True)) # date spatial_pixel column_A days-in-a-row # 0 2016-01-01 1000 5 0 # 1 2016-01-02 1000 1 1 # 2 2016-01-03 1000 1 2 # 3 2016-01-04 1000 1 3 # 4 2016-01-01 1001 1 1 # 5 2016-01-02 1001 1 2 # 6 2016-01-03 1001 5 0 # 7 2016-01-04 1001 1 1