Python 将另一个数据帧中的数据帧中的值替换为_Python_Pandas_Dataframe_Replace

Python 将另一个数据帧中的数据帧中的值替换为

python pandas dataframe replace

Python 将另一个数据帧中的数据帧中的值替换为,python,pandas,dataframe,replace,Python,Pandas,Dataframe,Replace,我有3个数据帧：df1，df2，df3。我试图用df2中包含的一些值填充df1的NaN值。从df2中选择的值也根据处理df3中存储的一些数据的简单函数（mul_val）的输出进行选择我能够得到这样的结果，但我希望以更简单、更容易的方式找到更可读的代码以下是我到目前为止的情况： import pandas as pd import numpy as np # simple function def mul_val(a,b): return a*b # dataframe 1 dat

我有3个数据帧：

df1

，

df2

，

df3

。我试图用

df2

中包含的一些值填充

df1

的

NaN

值。从

df2

中选择的值也根据处理

df3

中存储的一些数据的简单函数（

mul_val

）的输出进行选择

我能够得到这样的结果，但我希望以更简单、更容易的方式找到更可读的代码

以下是我到目前为止的情况：

import pandas as pd
import numpy as np

# simple function
def mul_val(a,b):
    return a*b

# dataframe 1
data = {'Name':['PINO','PALO','TNCO' ,'TNTO','CUCO' ,'FIGO','ONGF','LABO'],
        'Id'  :[  10  ,  9   ,np.nan ,  14   , 3    ,np.nan,  7   ,np.nan]}
df1 = pd.DataFrame(data)

# dataframe 2
infos = {'Info_a':[10,20,30,40,70,80,90,50,60,80,40,50,20,30,15,11],
         'Info_b':[10,30,30,60,10,85,99,50,70,20,30,50,20,40,16,17]}
df2 = pd.DataFrame(infos)

dic = {'Name': {0: 'FIGO', 1: 'TNCO'}, 
       'index': {0: [5, 6], 1: [11, 12, 13]}}
df3 = pd.DataFrame(dic)

#---------------Modify from here in the most efficient way!-----------------

for idx,row in df3.iterrows():
    store_val = []
    print(row['Name'])
    for j in row['index']:
        store_val.append([mul_val(df2['Info_a'][j],df2['Info_b'][j]),j])
    store_val = np.asarray(store_val)

    # - Identify which is the index of minimum value of the first column
    indx_min_val = np.argmin(store_val[:,0])

    # - Get the value relative number contained in the second column
    col_value = row['index'][indx_min_val]

    # Identify value to be replaced in df1
    value_to_be_replaced = df1['Id'][df1['Name']==row['Name']]

    # - Replace such value into the df1 having the same row['Name']
    df1['Id'].replace(to_replace=value_to_be_replaced,value=col_value, inplace=True)

通过在每次迭代中打印

store_val

，我得到：

FIGO
[[6800    5]   
 [8910    6]]
TNCO
[[2500   11]
 [ 400   12]
 [1200   13]]

让我们做一个简单的例子：考虑到

FIGO

，我将

确定为

和

之间的最小数字。因此，我选择了放置在

df1

中的编号

。对

df3

的其余行重复此操作（在本例中，我只有2行，但它们可能更多），最终结果应如下所示：

In[0]: before           In[0]: after
Out[0]:                 Out[0]: 
     Id  Name                Id  Name
0  10.0  PINO           0  10.0  PINO
1   9.0  PALO           1   9.0  PALO
2   NaN  TNCO  ----->   2  12.0  TNCO
3  14.0  TNTO           3  14.0  TNTO
4   3.0  CUCO           4   3.0  CUCO
5   NaN  FIGO  ----->   5   5.0  FIGO
6   7.0  ONGF           6   7.0  ONGF
7   NaN  LABO           7   NaN  LABO

Nore：如果需要，还可以删除for循环，并使用不同类型的格式来存储数据（列表、数组等）；重要的是，最终结果仍然是一个数据帧。

我可以提供两个类似的选项，在几行中实现与循环相同的结果：

1.使用apply和

fillna（）

（

fillna

比

combine\u first

快两倍）：

2.使用函数（lambda不支持赋值，因此必须应用func）

或者是一个不依赖全局定义的轻微变体：

def f(row, df1, df2):
    df1.ix[df1.Name==row['Name'], 'Id'] = (df2.Info_a*df2.Info_b).loc[row['index']].argmin()
df3.apply(f, args=(df1,df2,), axis=1)

请注意，您的解决方案虽然要详细得多，但使用这个小数据集所需的时间最少（我的两个数据集分别为7.5毫秒和9.5毫秒）。速度应该是相似的，因为在这两种情况下都是在

df3

def f(row):
    df1.ix[df1.Name==row['Name'], 'Id'] = (df2.Info_a*df2.Info_b).loc[row['index']].argmin()
df3.apply(f, axis=1)

def f(row, df1, df2):
    df1.ix[df1.Name==row['Name'], 'Id'] = (df2.Info_a*df2.Info_b).loc[row['index']].argmin()
df3.apply(f, args=(df1,df2,), axis=1)