Python 熊猫:熔化具有相同索引的多个列
我有以下数据帧:Python 熊猫:熔化具有相同索引的多个列,python,pandas,Python,Pandas,我有以下数据帧: +---+-----+-----+------+------+------+------+ | | A | B | C_10 | C_20 | D_10 | D_20 | +---+-----+-----+------+------+------+------+ | 1 | 0.1 | 0.2 | 1 | 2 | 3 | 4 | | 2 | 0.3 | 0.4 | 5 | 6 | 7 | 8 | +---+-----+
+---+-----+-----+------+------+------+------+
| | A | B | C_10 | C_20 | D_10 | D_20 |
+---+-----+-----+------+------+------+------+
| 1 | 0.1 | 0.2 | 1 | 2 | 3 | 4 |
| 2 | 0.3 | 0.4 | 5 | 6 | 7 | 8 |
+---+-----+-----+------+------+------+------+
现在,我想将C_10
,C_20
,d_10
,d_20
列合并,以获得如下数据帧:
+---+-----+-----+----+---+---+
| | A | B | N | C | D |
+---+-----+-----+----+---+---+
| 1 | 0.1 | 0.2 | 10 | 1 | 3 |
| 1 | 0.1 | 0.2 | 20 | 2 | 4 |
| 2 | 0.3 | 0.4 | 10 | 5 | 7 |
| 2 | 0.3 | 0.4 | 20 | 6 | 8 |
+---+-----+-----+----+---+---+
有没有一个简单的方法可以做到这一点?谢谢
编辑:我尝试了从宽到长
,但如果数据帧中存在重复行,则此操作无效:
+---+-----+-----+------+------+------+------+
| | A | B | C_10 | C_20 | D_10 | D_20 |
+---+-----+-----+------+------+------+------+
| 1 | 0.1 | 0.2 | 1 | 2 | 3 | 4 |
| 2 | 0.3 | 0.4 | 5 | 6 | 7 | 8 |
+---+-----+-----+------+------+------+------+
df=pd.DataFrame({
“组合”:[1,1,2,2],
“A”:[0.1,0.1,0.2,0.2],
“B”:[0.3,0.3,0.4,0.4],
"C_10":[1,5,6,7],,
"C_20":[2,6,7,8],,
“D_10”:[3,7,8,9],
“D_20”:[4,8,9,10],
})
如果我使用wide\u to\u long
我会得到以下错误:
>pd.wide_to_long(df,stubnames=['C','D',i=['combination','A','B'],j='N',sep='u')。重置索引()
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在里面
---->1 pd.宽到长(df,stubnames=['C','D',i=['combination','A','B'],j='N',sep='uu')。重置索引()
pandas/core/reforme/melt.py宽到长(df、stubnames、i、j、sep、后缀)
456
457如果df[i].duplicated().any():
-->458 raise VALUERROR(“id变量需要唯一标识每一行”)
459
460 value_vars=[获取stubnames中的stub的_var_名称(df、stub、sep、后缀)]
ValueError:id变量需要唯一标识每一行
参数i
被描述为“用作id变量的列”,但我不明白这到底是什么意思。使用:
编辑:如果A、B
列的可能组合不唯一,则可以使用将索引转换为列索引创建帮助器列,应用解决方案并最后删除级别索引
:
df = (pd.wide_to_long(df.reset_index(),
stubnames=['C','D'],
i=['index','A','B'],
j='N',
sep='_')
.reset_index(level=0, drop=True)
.reset_index())
print (df)
A B N combination C D
0 0.1 0.3 10 1 1 3
1 0.1 0.3 20 1 2 4
2 0.1 0.3 10 1 5 7
3 0.1 0.3 20 1 6 8
4 0.2 0.4 10 2 6 8
5 0.2 0.4 20 2 7 9
6 0.2 0.4 10 2 7 9
7 0.2 0.4 20 2 8 10
这对你有用吗:我遇到了另一个问题,编辑了我原来的帖子。@joe-92-编辑的答案。
df = (pd.wide_to_long(df.reset_index(),
stubnames=['C','D'],
i=['index','A','B'],
j='N',
sep='_')
.reset_index(level=0, drop=True)
.reset_index())
print (df)
A B N combination C D
0 0.1 0.3 10 1 1 3
1 0.1 0.3 20 1 2 4
2 0.1 0.3 10 1 5 7
3 0.1 0.3 20 1 6 8
4 0.2 0.4 10 2 6 8
5 0.2 0.4 20 2 7 9
6 0.2 0.4 10 2 7 9
7 0.2 0.4 20 2 8 10