Python 熊猫：根据另一列的标题有条件地选择要执行计算的列_Python_Pandas_Where_String Matching

Python 熊猫：根据另一列的标题有条件地选择要执行计算的列

python pandas

Python 熊猫：根据另一列的标题有条件地选择要执行计算的列,python,pandas,where,string-matching,Python,Pandas,Where,String Matching,我的数据框如下所示： (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (1, 8) (1, 9) (1, 10) (1, 11) ... 2 3 4 5 6 7 8 9 10 11 0 0 1 0 1 1 1 1 0 1 0 ... 0.612544 0.727393 0.366578 0.631451 0.722980 0.77285

我的数据框如下所示：

(1, 2)  (1, 3)  (1, 4)  (1, 5)  (1, 6)  (1, 7)  (1, 8)  (1, 9)  (1, 10) (1, 11) ... 2   3   4   5   6   7   8   9   10   11
0   0   1   0   1   1   1   1   0   1   0   ... 0.612544    0.727393    0.366578    0.631451    0.722980    0.772853    0.964982    0.549801    0.406692    0.798083
1   0   0   0   0   0   0   0   0   0   0   ... 0.583228    0.698729    0.343934    0.602037    0.694230    0.745422    0.954682    0.521298    0.382381    0.771640
2   1   0   0   1   0   1   1   0   0   0   ... 0.481291    0.593353    0.271028    0.498949    0.588807    0.641602    0.901779    0.424495    0.303309    0.669657
3   1   1   0   1   0   1   1   0   0   1   ... 0.583228    0.698729    0.343934    0.602037    0.694230    0.745422    0.954682    0.521298    0.382381    0.771640
4   0   0   0   1   1   1   1   1   1   1   ... 0.612544    0.727393    0.366578    0.631451    0.722980    0.772853    0.964982    0.549801    0.406692    0.798083

其中，列标题具有元组，如

（1，2）

，列标题是单个元素，如

。我想基于包含元组元素的列对元组列执行计算。例如，对于元组

（1，2）

，我想检索列

和

，将它们相乘，然后从列

（1，2）

中减去结果

我想到的解决方案是创建（55）个新列，从只包含单个元素的列（例如

或

）执行第一次计算，然后使用

.where（）

和

all（）

语句进行某种身份匹配。然而，这在计算上似乎相当低效，因为我将生成一组完整的其他数据，而不是直接在tuple列上执行计算。我该怎么做呢？

不确定这是否更快，但这里有一个不需要where（）/all（）的解决方案

import pandas as pd


# create some sample data
arr = [[1, 2, 3, 4, 5, 6, 7],
       [7, 6, 5, 4, 3, 2, 1]]
df = pd.DataFrame(arr, columns=[('a', 'b'), ('c','d'), ('a', 'd'), 'a', 'b', 'c', 'd'])

# get all tuple headers
tuple_columns = [col for col in df.columns if isinstance(col, tuple)]

# put the results into a list of series and concat into a DataFrame
results = pd.concat([df[col] - df[col[0]] * df[col[1]] for col in tuple_columns], axis=1)

# rename the columns
results.columns = tuple_columns