Python 3.x python xarray仅在特定日期对变量重新采样
我有一个Xarray数据集,每天的数据值不规则。有时一天有两个值,有时几天有间隔Python 3.x python xarray仅在特定日期对变量重新采样,python-3.x,pandas,python-xarray,Python 3.x,Pandas,Python Xarray,我有一个Xarray数据集,每天的数据值不规则。有时一天有两个值,有时几天有间隔 [Timestamp('2015-04-01 00:00:00'), Timestamp('2015-04-01 00:00:00'), Timestamp('2015-04-03 00:00:00'), Timestamp('2015-04-03 00:00:00'), Timestamp('2015-04-05 00:00:00'), Timestamp('2015-04-06 00:00:00'),
[Timestamp('2015-04-01 00:00:00'),
Timestamp('2015-04-01 00:00:00'),
Timestamp('2015-04-03 00:00:00'),
Timestamp('2015-04-03 00:00:00'),
Timestamp('2015-04-05 00:00:00'),
Timestamp('2015-04-06 00:00:00'),
Timestamp('2015-04-06 00:00:00')]
如果我应用重采样()
我最终得到了
[Timestamp('2015-04-01 00:00:00'),
Timestamp('2015-04-02 00:00:00'),
Timestamp('2015-04-03 00:00:00'),
Timestamp('2015-04-04 00:00:00'),
Timestamp('2015-04-05 00:00:00'),
Timestamp('2015-04-06 00:00:00'),
Timestamp('2015-04-07 00:00:00')]
但我正在寻找像这样的数据重采样
[Timestamp('2015-04-01 00:00:00'),
Timestamp('2015-04-03 00:00:00'),
Timestamp('2015-04-05 00:00:00'),
Timestamp('2015-04-06 00:00:00')]
在不向模型中添加新时间的情况下,我必须使用哪些选项才能获得相等天数的.mean()值?我尝试在一个小样本中重现问题:
value_1 = np.arange(0,7,1)
times = np.array(['2015-04-01', '2015-04-01', '2018-01-03', '2018-01-03', '2018-01-05', '2018-01-05', '2018-01-06'], dtype='datetime64')
time_ = xr.Dataset(
data_vars={'value': (('time'), value_1)},
coords={'time': times})
time_resample = time_.resample(time='1D').mean().sel(time=slice('2015-04-01', '2015-04-06'))
print(time_.time, time_resample.time)
<xarray.DataArray 'time' (time: 7)>
array(['2015-04-01T00:00:00.000000000', '2015-04-01T00:00:00.000000000',
'2018-01-03T00:00:00.000000000', '2018-01-03T00:00:00.000000000',
'2018-01-05T00:00:00.000000000', '2018-01-05T00:00:00.000000000',
'2018-01-06T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 2015-04-01 2015-04-01 ... 2018-01-06 <xarray.DataArray 'time' (time: 6)>
array(['2015-04-01T00:00:00.000000000', '2015-04-02T00:00:00.000000000',
'2015-04-03T00:00:00.000000000', '2015-04-04T00:00:00.000000000',
'2015-04-05T00:00:00.000000000', '2015-04-06T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 2015-04-01 2015-04-02 ... 2015-04-06
value_1=np.arange(0,7,1)
时间=np.数组(['2015-04-01','2015-04-01','2018-01-03','2018-01-03','2018-01-05','2018-01-05','2018-01-06'],数据类型='datetime64')
时间=数据集(
data_vars={'value':('time'),value_1},
coords={'time':times})
时间重采样=时间重采样(time='1D').mean().sel(时间=切片('2015-04-01','2015-04-06'))
打印(时间\时间,时间\重采样时间)
数组(['2015-04-01T00:00:00.000000000','2015-04-01T00:00:00.000000000',
“2018-01-03T00:00:00.000000000”、“2018-01-03T00:00:00.000000000”,
“2018-01-05T00:00:00.000000000”、“2018-01-05T00:00:00.000000000”,
'2018-01-06T00:00:00.000000000',dtype='datetime64[ns]]
协调:
*时间(时间)日期时间64[ns]2015-04-01 2015-04-01。。。2018-01-06
数组(['2015-04-01T00:00:00.000000000','2015-04-02T00:00:00.000000000',
“2015-04-03T00:00:00.000000000”、“2015-04-04T00:00:00.000000000”,
“2015-04-05T00:00:00.000000000”、“2015-04-06T00:00:00.000000000”],
dtype='datetime64[ns]')
协调:
*时间日期时间64[ns]2015-04-01 2015-04-02。。。2015-04-06
您必须按时间进行分组
并应用函数mean
time_groupby = time_.value.groupby('time').mean()
在这一点上,沙雷与熊猫非常相似
groupby('Date')
或类似的东西,而不是重采样
。如果您解决了问题,您可以回答问题并接受它(或任何其他答案)。这是一种比在问题内部编辑解决方案更好的方法
time_groupby = time_.value.groupby('time').mean()