Python 熊猫，反转一个热编码_Python_Pandas_One Hot Encoding

Python 熊猫，反转一个热编码

python pandas

Python 熊猫，反转一个热编码,python,pandas,one-hot-encoding,Python,Pandas,One Hot Encoding,我对某个变量进行了热编码，经过一些计算后，我想检索原始变量我现在做的是：我过滤一个热编码的列名（它们都以原始变量的名称开头，比如说'mycl'）然后我可以简单地将列名乘以过滤后的变量 X_test[filter_col]*filter_col 但是，这会导致稀疏矩阵。如何从中创建一个变量？求和不起作用，因为空格被视为数字，这样做：sum（X\u test[filter\u col]*filter\u col）I get TypeError: unsupported operand typ

我对某个变量进行了热编码，经过一些计算后，我想检索原始变量

我现在做的是：

我过滤一个热编码的列名（它们都以原始变量的名称开头，比如说

'mycl'

）

然后我可以简单地将列名乘以过滤后的变量

X_test[filter_col]*filter_col

但是，这会导致稀疏矩阵。如何从中创建一个变量？求和不起作用，因为空格被视为数字，这样做：

sum（X\u test[filter\u col]*filter\u col）

I get

TypeError: unsupported operand type(s) for +: 'int' and 'str'

有没有关于如何进行的建议？这甚至是最好的方法，还是有一些功能正是我所需要的

根据要求，以下是一个示例，摘自：

如果需要每行的总和值：

(X_test[filter_col]*filter_col).sum(axis=1)

X_test = pd.DataFrame({
         'mycolB':[0,1,1,0],
         'mycolC':[0,0,1,0],
         'mycolD':[1,0,0,0],

})


filter_col = [col for col in X_test if col.startswith('mycol')]
df = X_test[filter_col].dot(pd.Index(filter_col) + ', ' ).str.strip(', ')
print (df)
0            mycolD
1            mycolB
2    mycolB, mycolC
3                  
dtype: object

解决方案（如果可能）仅每行

或每行多个

：

(X_test[filter_col]*filter_col).sum(axis=1)

X_test = pd.DataFrame({
         'mycolB':[0,1,1,0],
         'mycolC':[0,0,1,0],
         'mycolD':[1,0,0,0],

})


filter_col = [col for col in X_test if col.startswith('mycol')]
df = X_test[filter_col].dot(pd.Index(filter_col) + ', ' ).str.strip(', ')
print (df)
0            mycolD
1            mycolB
2    mycolB, mycolC
3                  
dtype: object

IIUC，您可以使用沿

轴=1

。如有必要，您可以将虚拟前缀替换为：

您是否需要

（X_-test[filter\u col]*filter\u col）.sum（）

或

（X_-test[filter\u col]*filter\u col.sum（axis=1）

？另外，如果获得所有以字符串

mycol

开头的列，那么

X\u-test[filter\u col]*filter\u col

也失败，您可以创建一些吗？

X\u-test[filter\u col].idxmax（1）.str.replace（'mycol'，“”）

。？@ChrisA谢谢Chris=）你向前迈出了一步，也清理了结果

X_test[filter_col].idxmax(axis=1).str.replace('mycol_', '')