Python “熊猫”中的类日期时间索引`_Python_Pandas

Python “熊猫”中的类日期时间索引`

python pandas

Python “熊猫”中的类日期时间索引`,python,pandas,Python,Pandas,我有一个数据帧，其中包含累积降雨量的时间序列： df = pd.read_csv(csv_file, parse_dates=[['date', 'time']], dayfirst=True, index_col=0) 我无法共享源数据，它是通过适配器对象读取的，适配器对象将数据显示为文本文件，其中包含.csv内容以读取\u csv，尽管源文件是某种专有格式-但是，它与问题无关，最终结果是带有日期时间索引和浮点值的数据帧-日期可以被模拟然后将降雨量转换为重新采样的分钟数： rainfall

我有一个数据帧，其中包含累积降雨量的时间序列：

df = pd.read_csv(csv_file, parse_dates=[['date', 'time']], dayfirst=True, index_col=0)

我无法共享源数据，它是通过适配器对象读取的，适配器对象将数据显示为文本文件，其中包含.csv内容以读取\u csv，尽管源文件是某种专有格式-但是，它与问题无关，最终结果是带有日期时间索引和浮点值的数据帧-日期可以被模拟

然后将降雨量转换为重新采样的分钟数：

rainfall_differences = df['rainfall'].diff()
rainfall_differences = rainfall_differences.resample('1min', label='right', closed='right').sum()

所有这些都如预期的那样起作用。然而，我的问题是关于这两种说法之间的区别：

x = rainfall_differences.rolling('90min').sum()
y = rainfall_differences.rolling('1.5h').sum()

第一个有效，但第二个抛出异常：

  File "<<path>>/my_file.py", line 68, in load_rainfalls
    result[duration_label] = rainfall_differences.rolling(duration_label).sum()
  File "<<path>>\lib\site-packages\pandas\core\generic.py", line 10386, in rolling
    closed=closed,
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 94, in __init__
    self.validate()
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 1836, in validate
    freq = self._validate_freq()
  File "<<path>>\lib\site-packages\pandas\core\window\rolling.py", line 1888, in _validate_freq
    f"passed window {self.window} is not "
ValueError: passed window 1.5h is not compatible with a datetimelike index

我认为有必要将h改为h：

我认为原因是因为无效：

样本：

啊，答案很简单，像_timedelta这样的函数允许更多不同的变量，而滚动的频率字符串必须有一个更严格的格式字符串？非常不满意，但你似乎是对的1.5h不起作用，1.5h起作用…@Grismar-我认为是的，这是两件不同的事情，这里是频率字符串更“严格”

index_duration = str(int(pd.to_timedelta('1.5 hour').total_seconds() / 60)) + 'min'
y = rainfall_differences.rolling(index_duration).sum()

y = rainfall_differences.rolling('1.5H').sum()

Alias   Description
H       hourly frequency
T, min  minutely frequency
S       secondly frequency

rng = pd.date_range('2017-04-03', periods=5, freq='10T')
rainfall_differences = pd.DataFrame({'a': range(5)}, index=rng)  
print (rainfall_differences)
                     a
2017-04-03 00:00:00  0
2017-04-03 00:10:00  1
2017-04-03 00:20:00  2
2017-04-03 00:30:00  3
2017-04-03 00:40:00  4

y = rainfall_differences.rolling('1.5H').sum()
print (y)
                        a
2017-04-03 00:00:00   0.0
2017-04-03 00:10:00   1.0
2017-04-03 00:20:00   3.0
2017-04-03 00:30:00   6.0
2017-04-03 00:40:00  10.0