Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/287.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python pandas(pandas.pydata.org)何时在df.sortlevel(k)上抛出内存错误?_Python_Pandas - Fatal编程技术网

Python pandas(pandas.pydata.org)何时在df.sortlevel(k)上抛出内存错误?

Python pandas(pandas.pydata.org)何时在df.sortlevel(k)上抛出内存错误?,python,pandas,Python,Pandas,我有一个相当大的数据集(267827152)和一个5维索引,它消耗了机器6.5%的内存。 当我打电话时 df.sortlevel(k) 我收到以下错误: MemoryError Traceback (most recent call last) in () ----> 1 df = df.sortlevel(4) /usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7

我有一个相当大的数据集(267827152)和一个5维索引,它消耗了机器6.5%的内存。 当我打电话时

df.sortlevel(k)
我收到以下错误:



MemoryError                               Traceback (most recent call last)
 in ()
----> 1 df = df.sortlevel(4)

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in sortlevel(self, level, axis, ascending)
   2978             raise Exception('can only sort by level with a hierarchical index')
   2979 
-> 2980         new_axis, indexer = the_axis.sortlevel(level, ascending=ascending)
   2981 
   2982         if self._data.is_mixed_dtype():

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in sortlevel(self, level, ascending)
   1856         indexer = _indexer_from_factorized((primary,) + tuple(labels),
   1857                                            (primshp,) + tuple(shape),
-> 1858                                            compress=False)
   1859         if not ascending:
   1860             indexer = indexer[::-1]

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _indexer_from_factorized(labels, shape, compress)
   2124         max_group = np.prod(shape)
   2125 
-> 2126     indexer, _ = lib.groupsort_indexer(comp_ids.astype(np.int64), max_group)
   2127 
   2128     return indexer

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/lib.so in pandas.lib.groupsort_indexer (pandas/src/tseries.c:55052)()

MemoryError: 


是否存在引发此错误的硬编码条件?或者,即使数据只使用了6.5%的内存(根据htop),操作也会占用剩余的内存吗?

您能将其移动到GitHub吗?我需要检查代码,但在一些边缘情况下,我没有真正深入地测试“分级”层次索引。所以这可能是一个合法的bug


编辑:这已在v0.10.1中修复

在0.10中有相当多的性能增强。你能尝试使用最新版本的熊猫吗?在0.10中仍然有一些东西使我很难切换。在这种情况下,我必须等待0.10.1。但是关于这个问题是否有可以解释这种行为的具体变化?在
sortlevel
中添加了一个
inplace
选项,可能会减少内存使用: