Python pandas应用函数行方式花费的时间太长下面的代码是否有其他选择_Python_Pandas

Python pandas应用函数行方式花费的时间太长下面的代码是否有其他选择

python pandas

Python pandas应用函数行方式花费的时间太长下面的代码是否有其他选择,python,pandas,Python,Pandas,我有一个数据框和如下所示的大函数，我想将norm_group函数应用于数据框列，但使用apply命令花费了太多时间。有没有办法缩短这段代码的时间？目前每个环路需要24.4秒 import pandas as pd import numpy as np np.random.seed(1234) n = 1500000 df = pd.DataFrame() df['group'] = np.random.randint(1700, size=n) df['ID'] = np.random.ra

我有一个数据框和如下所示的大函数，我想将norm_group函数应用于数据框列，但使用apply命令花费了太多时间。有没有办法缩短这段代码的时间？目前每个环路需要24.4秒

import pandas as pd
import numpy as np

np.random.seed(1234)
n = 1500000

df = pd.DataFrame()
df['group'] = np.random.randint(1700, size=n)
df['ID'] = np.random.randint(5, size=n)
df['s_count'] = np.random.randint(5, size=n)
df['p_count'] = np.random.randint(5, size=n)
df['d_count'] = np.random.randint(5, size=n)
df['Total'] = np.random.randint(400, size=n)
df['Normalized_total'] = df.groupby('group')['Total'].apply(lambda x: (x-x.min())/(x.max()- x.min()))
df['Normalized_total'] = df['Normalized_total'].apply(lambda x:round(x,2))

def norm_group(a,b,c,d,e):
if a >= 0.7 and b >=1000 and c >2:
    return "Both High "
elif a >= 0.7 and b >=1000 and c < 2:
    return "High and C Low"
elif a >= 0.4 and b >=500 and d > 2:
    return "Medium and D High"
elif a >= 0.4 and b >=500 and d < 2:
    return "Medium and D Low"
elif a >= 0.4 and b >=500 and e > 2:
    return "Medium and E High"
elif a >= 0.4 and b >=500 and e < 2:
    return "Medium and E Low"
else:
    return "Low"

%timeit df['Categery'] = df.apply(lambda x:norm_group(a=x['Normalized_total'],b=x['group']), axis=1)

将熊猫作为pd导入
将numpy作为np导入
np.random.seed（1234）
n=1500000
df=pd.DataFrame（）
df['group']=np.random.randint（1700，大小=n）
df['ID']=np.random.randint（5，size=n）
df['s_count']=np.random.randint（5，size=n）
df['p_count']=np.random.randint（5，size=n）
df['d_count']=np.random.randint（5，size=n）
df['Total']=np.random.randint（400，大小=n）
df['Normalized_total']=df.groupby（'group'）['total'].apply（lambda x:（x-x.min（））/（x.max（）-x.min（））
df['Normalized_total']=df['Normalized_total'].应用（λx:round（x，2））
def norm_组（a、b、c、d、e）：
如果a>=0.7，b>=1000，c>2：
返回“双高”
如果a>=0.7，b>=1000，c<2：
返回“高电平和低电平”
如果a>=0.4，b>=500，d>2：
返回“中等和D高”
如果a>=0.4，b>=500，d<2：
返回“中等和D低”
如果a>=0.4，b>=500，e>2：
返回“中等和E高”
如果a>=0.4，b>=500，e<2：
返回“中等和E低”
其他：
返回“低”
%timeit df['Categery']=df.apply（λx:norm_组（a=x['Normalized_total']，b=x['group']），axis=1）

24.4 s±551 ms/圈（7次运行的平均值±标准偏差，每次1圈）

在我的原始数据框中有多个文本列，我希望应用类似的函数，与此函数相比，它需要花费更多的时间

谢谢

您可以使用

np进行矢量化。选择

：

df['Category'] = np.select((df['Normalized_total'].ge(0.7) & df['group'].ge(1000),
                            df['Normalized_total'].ge(0.4) & df['group'].ge(500)),
                           ('High', 'Medium'), default='Low'
                          )

性能：

255 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

这类问题有很多解决方法，这个答案应该能回答你的问题。谢谢你的回答。我已经编辑了我的问题，如果它只有2或3个条件，那么你的答案是正确的。假设我有多个if-else语句，那么它很难在select中写下来。有没有办法解决多个条件？@KumarAK然后检查我在注释中的答案，它可以处理n个条件。你只需将它们堆叠到

np中。选择默认值是最后一个，例如np。选择（[cond1，cond2，cond3]，[val1，val2，val3]，default=default\u val）
。