Python 尝试将包含dask数组块的xarray数据集加载到内存时出现内存/xarray/dask错误?

Python 尝试将包含dask数组块的xarray数据集加载到内存时出现内存/xarray/dask错误?,python,dask,python-xarray,Python,Dask,Python Xarray,第一次发布问题,因此如果需要,请毫不犹豫地指出补充/更正。另外,我不确定这是否真的是一个bug,如果您认为是,我将在xarray/dask GitHub上重定向我的问题。我也是xarray的新手,我来自matlab,但正在尝试转换到python。因此,这里是…我正在使用xarray打开两年的小时数据,并根据以下代码将其子集: ds=xr.open_mfdataset('F:/supdude/datatest/*', combine='by_coords') ds=ds.sel(longitude

第一次发布问题,因此如果需要,请毫不犹豫地指出补充/更正。另外,我不确定这是否真的是一个bug,如果您认为是,我将在xarray/dask GitHub上重定向我的问题。我也是xarray的新手,我来自matlab,但正在尝试转换到python。因此,这里是…我正在使用xarray打开两年的小时数据,并根据以下代码将其子集:

ds=xr.open_mfdataset('F:/supdude/datatest/*', combine='by_coords')
ds=ds.sel(longitude=slice(-145+360,-52+360), latitude=slice(70,41))
完成后,数据集如下所示:

<xarray.Dataset>
Dimensions:     (latitude: 117, longitude: 373, time: 17544)
Coordinates:
  * longitude   (longitude) float32 215.0 215.25 215.5 ... 307.5 307.75 308.0
  * latitude    (latitude) float32 70.0 69.75 69.5 69.25 ... 41.5 41.25 41.0
  * time        (time) datetime64[ns] 1979-01-01 ... 1980-12-31T23:00:00
Data variables:
    d2m         (time, latitude, longitude) float32 dask.array<chunksize=(8760, 117, 373), meta=np.ndarray>
<xarray.Dataset>
Dimensions:    (latitude: 117, longitude: 373, time: 731)
Coordinates:
  * time       (time) datetime64[ns] 1979-01-01 1979-01-02 ... 1980-12-31
  * longitude  (longitude) float32 215.0 215.25 215.5 ... 307.5 307.75 308.0
  * latitude   (latitude) float32 70.0 69.75 69.5 69.25 ... 41.5 41.25 41.0
Data variables:
    d2m        (time, latitude, longitude) float32 dask.array<chunksize=(1, 117, 373), meta=np.ndarray>
运行该命令时,我得到以下内存错误(最后是完全回溯):

这看起来很不对劲,因为这是全世界的年度文件大小。从技术上讲,我重采样后的数据集应该是每年大约500 MB(总共1Gb),正如您在上面看到的,重采样后的完整形状是(117337371),因此不确定为什么使用“.load()”会产生形状错误(8784721140)。我的计算机只有16Gb的RAM,所以我试着在一台64Gb的计算机上运行(从技术上讲,我应该能够直接在内存上打开这两个年份(每年17Gb),而不使用dask),但当我尝试使用“.load()”时,它填满了整个64Gb或RAM,导致计算机崩溃。目前,这只是一个测试,在不久的将来,我将不得不处理更大的数据集,因此,在拥有64Gb RAM的计算机上,如果没有dask,我真的无法打开这些数据集。进一步的测试表明,只有当我已经在一个数组上使用了“.load()”时,才会出现这个问题(也就是说,我想加载两个数组来比较它们,如果它在一个新内核中,第一个数组将加载,并在我的RAM上增加约1Gb,当我尝试加载第二个数组时,我会得到内存错误)。我对dask了解不多,我在网上发现的问题没有一个与我在这里得到的东西直接相关……可能是关于调度程序的问题,但不确定。有什么想法吗

完全回溯:

Traceback (most recent call last):

  File "<ipython-input-42-88724a7ac66e>", line 1, in <module>
    new.compute()

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\dataset.py", line 807, in compute
    return new.load(**kwargs)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\dataset.py", line 651, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\base.py", line 437, in compute
    results = schedule(dsk, keys, **kwargs)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\threaded.py", line 84, in get
    **kwargs

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\local.py", line 486, in get_async
    raise_exception(exc, tb)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\local.py", line 316, in reraise
    raise exc

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\local.py", line 222, in execute_task
    result = _execute_task(task, data)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 119, in _execute_task
    return func(*args2)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\array\core.py", line 106, in getter
    c = np.asarray(c)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 491, in __array__
    return np.asarray(self.array, dtype=dtype)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 653, in __array__
    return np.asarray(self.array, dtype=dtype)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 557, in __array__
    return np.asarray(array[self.key], dtype=None)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 72, in __array__
    return self.func(self.array)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 218, in _scale_offset_decoding
    data = np.array(data, dtype=dtype, copy=True)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 72, in __array__
    return self.func(self.array)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 138, in _apply_mask
    data = np.asarray(data, dtype=dtype)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 557, in __array__
    return np.asarray(array[self.key], dtype=None)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 73, in __getitem__
    key, self.shape, indexing.IndexingSupport.OUTER, self._getitem

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 837, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 85, in _getitem
    array = getitem(original_array, key)

  File "netCDF4\_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.__getitem__

  File "netCDF4\_netCDF4.pyx", line 5335, in netCDF4._netCDF4.Variable._get

MemoryError: Unable to allocate 17.0 GiB for an array with shape (8784, 721, 1440) and data type >i2
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
new.compute()
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\core\dataset.py”,第807行,在compute中
返回新的。加载(**kwargs)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\core\dataset.py”,第651行,已加载
已计算数据=da.compute(*lazy_data.values(),**kwargs)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\base.py”,第437行,在compute中
结果=进度表(dsk、键、**kwargs)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\threaded.py”,第84行,在get中
**夸尔斯
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\local.py”,第486行,在get\u async中
raise_异常(exc、tb)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\local.py”,第316行,重新登录
加薪
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\local.py”,第222行,在执行任务中
结果=_执行_任务(任务、数据)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\core.py”,第118行,在执行任务中
args2=[[为args中的a执行任务(a,缓存)]
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\core.py”,第118行,在
args2=[[为args中的a执行任务(a,缓存)]
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\core.py”,第118行,在执行任务中
args2=[[为args中的a执行任务(a,缓存)]
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\core.py”,第118行,在
args2=[[为args中的a执行任务(a,缓存)]
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\core.py”,第119行,在执行任务中
返回函数(*args2)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\dask\array\core.py”,第106行,在getter中
c=np.asarray(c)
asarray中的文件“C:\Users\Psybot\Anaconda3\lib\site packages\numpy\core\\u asarray.py”,第85行
返回数组(a,数据类型,copy=False,order=order)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\core\index.py”,第491行,在\uu数组中__
返回np.asarray(self.array,dtype=dtype)
asarray中的文件“C:\Users\Psybot\Anaconda3\lib\site packages\numpy\core\\u asarray.py”,第85行
返回数组(a,数据类型,copy=False,order=order)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\core\index.py”,第653行,在数组中__
返回np.asarray(self.array,dtype=dtype)
asarray中的文件“C:\Users\Psybot\Anaconda3\lib\site packages\numpy\core\\u asarray.py”,第85行
返回数组(a,数据类型,copy=False,order=order)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\core\index.py”,第557行,在\uuu数组中__
返回np.asarray(数组[self.key],dtype=None)
asarray中的文件“C:\Users\Psybot\Anaconda3\lib\site packages\numpy\core\\u asarray.py”,第85行
返回数组(a,数据类型,copy=False,order=order)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\coding\variables.py”,第72行,在数组中__
返回self.func(self.array)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\coding\variables.py”,第218行,按比例\u偏移量\u解码
data=np.array(data,dtype=dtype,copy=True)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\coding\variables.py”,第72行,在数组中__
返回self.func(self.array)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\coding\variables.py”,第138行,在应用掩码中
data=np.asarray(data,dtype=dtype)
asarray中的文件“C:\Users\Psybot\Anaconda3\lib\site packages\numpy\core\\u asarray.py”,第85行
返回数组(a,数据类型,copy=False,order=order)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\core\index.py”,第557行,在\uuu数组中__
返回np.asarray(数组[self.key],dtype=None)
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\backends\netCDF4.py”,第73行,在uu getitem中__
键,self.shape,index.IndexingSupport.OUTER,self.\u getitem
文件“C:\Users\Psybot\Anaconda3\lib\site packages\xarray\core\index.py”,第837行,在显式索引适配器中
结果=原始索引方法(raw\u key.tuple)
文件“C:\Users\Ps
MemoryError: Unable to allocate 17.0 GiB for an array with shape (8784, 721, 1440) and data type >i2
Traceback (most recent call last):

  File "<ipython-input-42-88724a7ac66e>", line 1, in <module>
    new.compute()

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\dataset.py", line 807, in compute
    return new.load(**kwargs)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\dataset.py", line 651, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\base.py", line 437, in compute
    results = schedule(dsk, keys, **kwargs)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\threaded.py", line 84, in get
    **kwargs

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\local.py", line 486, in get_async
    raise_exception(exc, tb)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\local.py", line 316, in reraise
    raise exc

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\local.py", line 222, in execute_task
    result = _execute_task(task, data)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 118, in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\core.py", line 119, in _execute_task
    return func(*args2)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\dask\array\core.py", line 106, in getter
    c = np.asarray(c)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 491, in __array__
    return np.asarray(self.array, dtype=dtype)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 653, in __array__
    return np.asarray(self.array, dtype=dtype)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 557, in __array__
    return np.asarray(array[self.key], dtype=None)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 72, in __array__
    return self.func(self.array)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 218, in _scale_offset_decoding
    data = np.array(data, dtype=dtype, copy=True)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 72, in __array__
    return self.func(self.array)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\coding\variables.py", line 138, in _apply_mask
    data = np.asarray(data, dtype=dtype)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 557, in __array__
    return np.asarray(array[self.key], dtype=None)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 73, in __getitem__
    key, self.shape, indexing.IndexingSupport.OUTER, self._getitem

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\core\indexing.py", line 837, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)

  File "C:\Users\Psybot\Anaconda3\lib\site-packages\xarray\backends\netCDF4_.py", line 85, in _getitem
    array = getitem(original_array, key)

  File "netCDF4\_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.__getitem__

  File "netCDF4\_netCDF4.pyx", line 5335, in netCDF4._netCDF4.Variable._get

MemoryError: Unable to allocate 17.0 GiB for an array with shape (8784, 721, 1440) and data type >i2