Python：如果列包含字符串，则提取另一列'；s值_Python_Pandas

Python：如果列包含字符串，则提取另一列'；s值

python pandas

Python：如果列包含字符串，则提取另一列'；s值,python,pandas,Python,Pandas,我有两个数据帧DFa和DFb。 DFa包含4列：Date，macro\u A，macro\u B，macro\u C。然而，DFb包含3列：Name，Region，Transformation 我试图实现的是，我希望检查DFa的列名是否包含在DFb.Name中；如果是，那么我将从DFb中提取等价的Transformation方法。根据转换方法的不同，我将适当地转换DFa列 DFa = pd.DataFrame({'Date' : [2010, 2011, 2012, 2013], 'macr

我有两个数据帧

DFa

和

DFb

。

DFa

包含4列：

Date

，

macro\u A

，

macro\u B

，

macro\u C

。然而，

DFb

包含3列：

Name

，

Region

，

Transformation

我试图实现的是，我希望检查

DFa

的列名是否包含在

DFb.Name

中；如果是，那么我将从

DFb

中提取等价的

Transformation

方法。根据转换方法的不同，我将适当地转换

DFa

列

DFa = pd.DataFrame({'Date' : [2010, 2011, 2012, 2013],
'macro_A' : [0.23, 0.20, 0.13, 0.19], 
'macro_B' : [0.23, 0.20, 0.13, 0.19], 
'macro_C' : [0.23, 0.20, 0.13, 0.19]}, index = [1, 2, 3, 4])

DFb = pd.DataFrame({'Name' : ['macro_C', 'macro_B', 'macro_D', 'macro_A', 'macro_E'],
'Region' : ['UK', 'UK', 'US', 'UK', 'EUR'], 
'Transformation' : ['non', 'STD', 'STD', 'STD', 'non']}, 
 index = [1, 2, 3, 4, 5])

例如，我检查

DFa

中的

macro\u A

列是否存在于

DFb.Name

中。然后，我从

DFb.Transformation

检查值是否为

STD

，这意味着我需要对

DFa.macro\u A

进行转换（标准化）

另一方面，我从

DFa

中检查

macro\u C

是否存在于

DFb.Name

中，但是

macro\u C

的

DFb.Transformation

是

非。因此，我将DFa.macro_C保持原样
我已经构建了这个代码
for j, k in enumerate(DFa.columns):
    for i, x in enumerate(DFb['Name']):
        if x == k:
            if DFb.ix[i, 'Transformation'] == 'STD':
                DFa.iloc[:, j] = preprocessing.scale(DFa.iloc[: j])

如何使代码更高效
 以下更正的代码起作用：
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
for j, k in enumerate(DFa.columns):
    for i, x in enumerate(DFb.Name):
        if x == k and DFb.iloc[i,:]['Transformation'] == 'STD':
            DFa.iloc[:,j] = min_max_scaler.fit_transform(DFa.iloc[:,j])

print(DFa)

输出：
...some DEPRECATION_MSG warnings...
   Date  macro_A  macro_B  macro_C
1  2010      1.0      1.0     0.23
2  2011      0.7      0.7     0.20
3  2012      0.0      0.0     0.13
4  2013      0.6      0.6     0.19

宏A和宏B已缩放，但未缩放宏C。
我认为您可以通过使用列名来避免枚举
和iloc
。我还建议使用string->lambda
映射来存储操作，并使用apply
函数。当您有多个操作字符串时，它会有所帮助
operations = {'STD': lambda x : min_max_scaler.fit_transform(x),
              'non': lambda x : x} # Operations map 

for colName in DFa.columns.values:
    transformStr = DFb.Transformation[DFb.Name == colName] #Get the transform string by matching column name with Name column

    if transformStr.shape[0] > 1 or transformStr.shape[0] == 0: # Make sure that only one operation is selected
        raise(Exception('Invalid transform string %s',transformStr))

    DFa[colName] = DFa[colName].apply(operations[transformStr.iloc[0]])