Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在数据帧上矢量化乘法和dict映射而不进行迭代?_Python_Loops_Pandas_Dataframe_Vectorization - Fatal编程技术网

Python 在数据帧上矢量化乘法和dict映射而不进行迭代?

Python 在数据帧上矢量化乘法和dict映射而不进行迭代?,python,loops,pandas,dataframe,vectorization,Python,Loops,Pandas,Dataframe,Vectorization,我有一个熊猫数据帧,df: import pandas as pd import numpy as np import math df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]}) 还有一个口述,面具: mask = {1:32,2:64,3:100,4:200} 我希望最终结果是这样的数据帧: A B C 1 1 32 2 2 64 2 3 96 4 4 400 n

我有一个熊猫数据帧,
df

import pandas as pd
import numpy as np
import math

df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
还有一个口述,
面具

mask = {1:32,2:64,3:100,4:200}
我希望最终结果是这样的数据帧:

A    B    C
1    1    32
2    2    64
2    3    96
4    4    400
nan  nan  nan
现在我正在做这件事,似乎效率不高:

for idx, row in df.iterrows():
    if not math.isnan(row['A']):
        if row['A'] != 1:
            df.loc[idx, 'C'] = row['B'] * mask[row['A'] - 1]
        else:
            df.loc[idx, 'C'] = row['B'] * mask[row['A']]

有没有一种简单的方法可以将其矢量化?

这里有一个使用
apply
的选项,以及字典的
get
方法,如果键不在字典中,该方法将返回
None

df['C'] = df.apply(lambda r: mask.get(r.A) if r.A == 1 else mask.get(r.A - 1), axis = 1) * df.B

df    
#   A   B   C
#0  1   1   32
#1  2   2   64
#2  2   3   96
#3  4   4   400
#4  NaN 5   NaN
这应该起作用:

df['C'] = df.B * (df.A - (df.A != 1)).map(mask)


时机 10000行

# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(2000)])
# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(20000)])

100000行

# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(2000)])
# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(20000)])