代码占用的时间是python数据帧的一个问题_Python_Python 3.x_Pandas_Dataframe

代码占用的时间是python数据帧的一个问题

python python-3.x pandas dataframe

代码占用的时间是python数据帧的一个问题,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我需要有关以下代码所花费的数据帧相关时间的帮助。完成大约2000条记录的数据集大约需要20秒 def findRe(leaddatadf, keyAttributes, datadf): for combs in itertools.combinations(atrList, len(atrList)-1): v_by =(set(atrList) - set(combs)) # varrying grpdatapf=datadf.grou

我需要有关以下代码所花费的数据帧相关时间的帮助。完成大约2000条记录的数据集大约需要20秒

def findRe(leaddatadf, keyAttributes, datadf):
    for combs in itertools.combinations(atrList,
        len(atrList)-1):

        v_by =(set(atrList) - set(combs)) # varrying


    grpdatapf=datadf.groupby(combs)
    for name, group in grpdatapf:

        if(group.shape[0]>1):

            tmpgdf = leaddatadf[leaddatadf['unique_id'].astype(float).\
                isin(group['unique_id'].astype(float))]
            if(tmpgdf.shape[0]>1):

                tmpgdf['mprice']=tmpgdf['mprice'].astype(float)
                tmpgdf=tmpgdf.sort('mprice')

                tmpgdf['id'] = tmpgdf['id']
                tmpgdf['desc'] = tmpgdf['description']
                tmpgdf['related_id'] = tmpgdf['id'].shift(-1)
                tmpgdf['related_desc'] = tmpgdf['description'].shift(-1)
                tmpgdf['related_mprice'] = tmpgdf['mprice'].shift(-1)

                tmpgdf['pld'] = np.where(
                    (tmpgdf['related_price'].astype(float) > \
                        tmpgdf['mprice'].astype(float)),
                    (tmpgdf['related_price'].astype(float) - \
                        tmpgdf['mprice'].astype(float)) ,
                    (tmpgdf['mprice'].astype(float) - \
                        tmpgdf['related_mprice'].astype(float)))
                tmpgdf['pltxt'] = np.where(
                    tmpgdf['related_mprice'].astype(float) - \
                        tmpgdf['mprice'].astype(float)>0.0,'<',
                    np.where(tmpgdf['related_mprice'].astype(float)\
                        - tmpgdf['mprice'].astype(float)<0,'>','='))
                tmpgdf['prc_rlt_dif_nbr_p'] = abs(
                    (tmpgdf['pld'].astype(float) / \
                        ((tmpgdf['mprice'].astype(float)))) )
                tmpgdf['keyatr'] = str(atrList)
                tmpgdf['varying'] = np.where(1==1,
                    "".join(v_by ),'')# varrying

                temp = tmpgdf[['id',
            'desc', 'related_id',
            'related_desc', 'pltxt', 'pld',
            'prc_rlt_dif_nbr_p', 'mprice', 'related_mprice',
            'keyatr', 'varying']]

                temp = temp[temp['related_mprice'].astype(float)>=0.0]
                reldf.extend(list(temp.T.to_dict().values()))
    return pd.DataFrame(
                reldf, columns = ['id',
                    'desc', 'related_id',
                    'related_desc', 'pltxt', 'pld',
                    'prc_rlt_dif_nbr_p', 'mprice', 'related_mprice',
                    'keyatr', 'varying'])

请在每行后打印需要多少毫秒

用这个

并返回占用时间最多的行

您经常使用astypefloat。每次使用时，都会创建该系列的副本。当您尝试加载数据帧时，可以尝试在一开始就设置dtype=float，这样您只需将序列转换为float一次，而不是在每次迭代时：

让我知道这是否有帮助

获取tmpdf所花费的时间-0.00083160400390625-重置索引所花费的时间-0.00066137313842777344-mprice浮动所花费的时间-0.0002810955047607422-mprice排序所花费的时间-0.00074672698974638-id所花费的时间-0.0015559196472167969-desc所花费的时间-0.0017049312552734-相关id占用的时间-0.0018208026885986328-相关描述占用的时间-0.0018434524536132812-相关价格占用的时间-0.0015764236450195312-pld占用的时间-0.0020411014556884766-pltxt占用的时间-0.0022830963134765625-中国dif nbr p占用的时间-0.001756429672241211-keyatr占用的时间-0.0015103816986083984-变化所花费的时间-0.0020063076171875-allcomb所花费的时间-0.0007736682891845703-将df转换为dict列表所花费的时间-0.00047779083251953125-hi Jm我增加了每行所花费的时间。总共有14个关键atr列表大小。谢谢，先生，我会这样做，我想改进的另一件事是，我有14-1个独特的组合，每一个组合我都在分组，并找到当前行和下一行之间的关系。我认为Stackoverflow不是一个代码审查站点。这个问题没有特别的问题，所以我认为应该把它移到别的地方。那么我应该问什么呢