Python 使用堆栈和取消堆栈对数据帧进行整形_Python_Pandas_Dataframe

Python 使用堆栈和取消堆栈对数据帧进行整形

python pandas dataframe

Python 使用堆栈和取消堆栈对数据帧进行整形,python,pandas,dataframe,Python,Pandas,Dataframe,我正试图搞乱熊猫的堆叠和展开。我想知道是否有可能以这种方式重塑我的数据 ID 1 Index(Extra Column) Value1, value2 1 3 12 2 4 13 3 5 14 4 6 15 5 7

我正试图搞乱熊猫的堆叠和展开。我想知道是否有可能以这种方式重塑我的数据

ID 
1   Index(Extra Column) Value1, value2
    1                      3    12
    2                      4    13
    3                      5    14
    4                      6    15
    5                      7    16

2
    1                      8    17
    2                      9    18
    3                      10   19
    4                      11   20

这是我正在练习的样本数据

ID,Value1,Value2
1,3,12
1,4,13
1,5,14
1,6,15
1,7,16
2,8,17
2,9,18
2,10,19
2,11,20

我想以这种方式重塑

ID 
1   Index(Extra Column) Value1, value2
    1                      3    12
    2                      4    13
    3                      5    14
    4                      6    15
    5                      7    16

2
    1                      8    17
    2                      9    18
    3                      10   19
    4                      11   20

我试过这个

df1 = pd.DataFrame(df[['Value1', 'Value2']], index= df['ID']).stack()

或

这会将Value1和Value2从列更改为我不想要的行

有什么想法吗

我建议

设置索引

cumcount

如下：

df.set_index(['ID', df.groupby('ID').cumcount() + 1])

      Value1  Value2
ID                  
1  1       3      12
   2       4      13
   3       5      14
   4       6      15
   5       7      16
2  1       8      17
   2       9      18
   3      10      19
   4      11      20

另一个选项是使用

concat

：

pd.concat({k : g.reset_index(drop=True) for k, g in df.drop('ID', 1).groupby(df.ID)})

     Value1  Value2
1 0       3      12
  1       4      13
  2       5      14
  3       6      15
  4       7      16
2 0       8      17
  1       9      18
  2      10      19
  3      11      20

单程申请

df.groupby('ID')[['Value1','Value2']].apply(lambda x : x.reset_index(drop=True))
Out[662]: 
      Value1  Value2
ID                  
1  0       3      12
   1       4      13
   2       5      14
   3       6      15
   4       7      16
2  0       8      17
   1       9      18
   2      10      19
   3      11      20

defaultdict

和

count

还有一个问题，你知道吗，哪种格式最适合保存这种输出。Csv、excel、txt、不正确保存？@user3280146这是个问题，这些平面文件都不支持多索引。我的建议是执行

df=df.reset_index（）

，然后保存到CSV。之后，在加载时，指定

df=pd.read\u csv（…，index\u col=[0，1]）

，它将这两列作为多索引读取。

from itertools import count
from collections import defaultdict

d = defaultdict(count)

df.set_index(['ID', np.array([next(d[x]) for x in df.ID])])

      Value1  Value2
ID                  
1  0       3      12
   1       4      13
   2       5      14
   3       6      15
   4       7      16
2  0       8      17
   1       9      18
   2      10      19
   3      11      20