在python中使用pandas加速类似vlookup的操作

在python中使用pandas加速类似vlookup的操作,python,performance,pandas,vlookup,Python,Performance,Pandas,Vlookup,我已经编写了一些代码,基本上是在两个熊猫数据帧上执行excel风格的vlookup,并希望加快速度 数据帧的结构如下所示: dbase1_df.列: “值”、“计数”、“网格”、“SGO10GEO” 合并的列: ‘网格’、‘ST0’、‘ST1’、‘ST2’、‘ST3’、‘ST4’、‘ST5’、‘ST6’、‘ST7’、‘ST8’、‘ST9’、‘ST10’ sgo_df.列: “mkey”,“type” 要组合它们,我需要执行以下操作: 1.对于dbase1_df中的每一行,查找其“SGO10GEO

我已经编写了一些代码,基本上是在两个熊猫数据帧上执行excel风格的vlookup,并希望加快速度

数据帧的结构如下所示: dbase1_df.列:
“值”、“计数”、“网格”、“SGO10GEO”

合并的列:
‘网格’、‘ST0’、‘ST1’、‘ST2’、‘ST3’、‘ST4’、‘ST5’、‘ST6’、‘ST7’、‘ST8’、‘ST9’、‘ST10’

sgo_df.列:
“mkey”,“type”

要组合它们,我需要执行以下操作:
1.对于dbase1_df中的每一行,查找其“SGO10GEO”值与sgo_df的“mkey”值匹配的行。从sgo_df中的该行获取“类型”

  • “type”包含一个从0到10的整数。通过在类型后面追加“ST”来创建列名

  • 在merged_df中查找该值,其中其“GRID”值与dbase1_df中的“GRID”值匹配,列名称是我们在步骤2中获得的名称。将此值输出到csv文件中

  • //将dbase1 dbf读入数据帧

    dbase1_df=pandas.DataFrame.from_csv(dbase1_文件,索引_col=False)
    合并的\u df=pandas.DataFrame.from\u csv('merged.csv',index\u col=False)

    lup_out.writerow([“VALUE”,“TYPE”,EXTRACT_VAR.upper())
    //对于dbase1数据帧中的每个唯一值:
    对于索引,dbase1_df.iterrows()中的行:

    这里有什么可能是速度瓶颈的地方吗?目前,在dbase1_df中,大约500000行需要20分钟~合并的__df中有1000行,sgo_df中有约500000行


    谢谢

    您需要在Pandas中使用合并操作以获得更好的性能。我无法测试以下代码,因为我没有数据,但至少可以帮助您了解以下内容:

    import pandas as pd
    
    dbase1_df = pd.DataFrame.from_csv('dbase1_file.csv',index_col=False)
    sgo_df = pd.DataFrame.from_csv('sgo_df.csv',index_col=False)
    merged_df = pd.DataFrame.from_csv('merged_df.csv',index_col=False)
    
    #you need to use the same column names for common columns to be able to do the merge operation in pandas , so we changed the column name to mkey
    
    dbase1_df.columns = [u'VALUE', u'COUNT', u'GRID', u'mkey']
    
    #Below operation merges the two dataframes
    Step1_Merge = pd.merge(dbase1_df,sgo_df)
    
    #We need to add a new column to concatenate ST and type
    Step1_Merge['type_2'] = Step1_Merge['type'].map(lambda x: 'ST'+str(x))
    
    # We need to change the shape of merged_df and move columns to rows to be able to do another merge operation
    id = merged_df.ix[:,['GRID']]
    a = pd.merge(merged_df.stack(0).reset_index(1), id, left_index=True, right_index=True)
    
    # We also need to change the automatically generated name to type_2 to be able to do the next merge operation
    a.columns = [u'type_2', 0, u'GRID']
    
    
    result = pd.merge(Step1_Merge,a,on=[u'type_2',u'GRID'])
    

    您能为您的列提供数据类型吗?我已经准备好了一些东西,但我想在安装之前对它进行测试。谢谢,除了merged_df中以“ST”开头的列外,所有列都是整数:“ST0”、“ST1”、“ST2”、“ST3”、“ST4”、“ST5”、“ST6”、“ST7”、“ST8”、“ST9”、“ST10”。谢谢,最后一行:result=pd.merge(b、a、on=[u'type_2',u'GRID'))。第一个参数‘b’是什么?对不起……我刚刚修正了……这是第一个参数的结果merge@user308827这又是怎么回事?(从20分钟开始…)
    import pandas as pd
    
    dbase1_df = pd.DataFrame.from_csv('dbase1_file.csv',index_col=False)
    sgo_df = pd.DataFrame.from_csv('sgo_df.csv',index_col=False)
    merged_df = pd.DataFrame.from_csv('merged_df.csv',index_col=False)
    
    #you need to use the same column names for common columns to be able to do the merge operation in pandas , so we changed the column name to mkey
    
    dbase1_df.columns = [u'VALUE', u'COUNT', u'GRID', u'mkey']
    
    #Below operation merges the two dataframes
    Step1_Merge = pd.merge(dbase1_df,sgo_df)
    
    #We need to add a new column to concatenate ST and type
    Step1_Merge['type_2'] = Step1_Merge['type'].map(lambda x: 'ST'+str(x))
    
    # We need to change the shape of merged_df and move columns to rows to be able to do another merge operation
    id = merged_df.ix[:,['GRID']]
    a = pd.merge(merged_df.stack(0).reset_index(1), id, left_index=True, right_index=True)
    
    # We also need to change the automatically generated name to type_2 to be able to do the next merge operation
    a.columns = [u'type_2', 0, u'GRID']
    
    
    result = pd.merge(Step1_Merge,a,on=[u'type_2',u'GRID'])