Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/346.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/shell/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何更快地将pandas 2d多重索引重塑为numpy 3d?_Python_Pandas_Numpy_3d_Reshape - Fatal编程技术网

Python 如何更快地将pandas 2d多重索引重塑为numpy 3d?

Python 如何更快地将pandas 2d多重索引重塑为numpy 3d?,python,pandas,numpy,3d,reshape,Python,Pandas,Numpy,3d,Reshape,我有以下运行良好的代码: import pandas as pd import numpy as np X = pd.DataFrame({'CaseID':[1,1,2,2], 'col1': [1,2,1,2], 'col2': [1,1,2,2]}) X.set_index(['CaseID','col1'], inplace=True) #MultiIndex Unique_Cases = X.index.levels[0]

我有以下运行良好的代码:

import pandas as pd
import numpy as np

X = pd.DataFrame({'CaseID':[1,1,2,2],
              'col1':  [1,2,1,2],
              'col2':  [1,1,2,2]})
X.set_index(['CaseID','col1'], inplace=True) #MultiIndex

Unique_Cases = X.index.levels[0]
print(Unique_Cases)
#[1, 2]

D = [X.loc[Case].values for Case in Unique_Cases]
print(np.array(D).shape)
#(2, 2, 1)
但问题是我有5000万条记录,这需要很多时间(10小时)。 有没有一种更快的方法可以将2d熊猫变成3d numpy阵列

澄清: 长度不总是一样的

解决方案:
解决方案为np.split:

case_counts = X.CaseID.value_counts().to_frame('counts').sort_index()
case_counts['count_cumsum'] = case_counts.counts.cumsum()
#drop the last row for split
case_counts.drop(case_counts.tail(1).index,inplace=True)
cat_values = X[cat].values
cat_values = np.split(cat_values, case_counts.count_cumsum)

对于
独特案例中的每个
案例
,len(X.loc[Case])
是否始终相同?不,这正是问题所在。对于记录量相同的情况,我已经看到了许多解决方案,但不幸的是,这里的情况并非如此:(假设
X=pd.DataFrame({'CaseID':[1,1,1,2],'col1':[1,2,1,2],'col2':[1,1,2,2]})
。期望的结果是什么?您的代码将产生
D
(2,).NumPy数组是N维的“矩形”数组。每个轴都有固定的长度。如果试图将不规则的列表列表(例如
[[1,1,2],[2]]
)转换为NumPy数组,NumPy将返回一个一维对象数组:
np.array([[1,1,2],[2]],dtype='object')
。没有(有用的)将其转换为3D数组的方法。将其转换为3D数组的无效方法是:
np.array([[1,1,2],[2]),dtype='object')。重塑(-1,1,1)
,它具有shape
(2,1,1)
。但是在数组的末尾附加轴(长度为1)似乎没有任何用处。
case_counts = X.CaseID.value_counts().to_frame('counts').sort_index()
case_counts['count_cumsum'] = case_counts.counts.cumsum()
#drop the last row for split
case_counts.drop(case_counts.tail(1).index,inplace=True)
cat_values = X[cat].values
cat_values = np.split(cat_values, case_counts.count_cumsum)
case_counts = X.CaseID.value_counts().to_frame('counts').sort_index()
case_counts['count_cumsum'] = case_counts.counts.cumsum()
#drop the last row for split
case_counts.drop(case_counts.tail(1).index,inplace=True)
cat_values = X[cat].values
cat_values = np.split(cat_values, case_counts.count_cumsum)