在python中使用pandas加速类似vlookup的操作_Python_Performance_Pandas_Vlookup

在python中使用pandas加速类似vlookup的操作

python performance pandas

在python中使用pandas加速类似vlookup的操作,python,performance,pandas,vlookup,Python,Performance,Pandas,Vlookup,我已经编写了一些代码，基本上是在两个熊猫数据帧上执行excel风格的vlookup，并希望加快速度数据帧的结构如下所示： dbase1_df.列： “值”、“计数”、“网格”、“SGO10GEO” 合并的列： ‘网格’、‘ST0’、‘ST1’、‘ST2’、‘ST3’、‘ST4’、‘ST5’、‘ST6’、‘ST7’、‘ST8’、‘ST9’、‘ST10’ sgo_df.列： “mkey”，“type” 要组合它们，我需要执行以下操作： 1.对于dbase1_df中的每一行，查找其“SGO10GEO

我已经编写了一些代码，基本上是在两个熊猫数据帧上执行excel风格的vlookup，并希望加快速度

数据帧的结构如下所示： dbase1_df.列：
“值”、“计数”、“网格”、“SGO10GEO”

合并的列：
‘网格’、‘ST0’、‘ST1’、‘ST2’、‘ST3’、‘ST4’、‘ST5’、‘ST6’、‘ST7’、‘ST8’、‘ST9’、‘ST10’

sgo_df.列：
“mkey”，“type”

要组合它们，我需要执行以下操作：
1.对于dbase1_df中的每一行，查找其“SGO10GEO”值与sgo_df的“mkey”值匹配的行。从sgo_df中的该行获取“类型”

“type”包含一个从0到10的整数。通过在类型后面追加“ST”来创建列名

在merged_df中查找该值，其中其“GRID”值与dbase1_df中的“GRID”值匹配，列名称是我们在步骤2中获得的名称。将此值输出到csv文件中

//将dbase1 dbf读入数据帧

dbase1_df=pandas.DataFrame.from_csv（dbase1_文件，索引_col=False）
合并的\u df=pandas.DataFrame.from\u csv（'merged.csv'，index\u col=False）

lup_out.writerow（[“VALUE”，“TYPE”，EXTRACT_VAR.upper（））
//对于dbase1数据帧中的每个唯一值：
对于索引，dbase1_df.iterrows（）中的行：

这里有什么可能是速度瓶颈的地方吗？目前，在dbase1_df中，大约500000行需要20分钟~合并的__df中有1000行，sgo_df中有约500000行

谢谢

您需要在Pandas中使用合并操作以获得更好的性能。我无法测试以下代码，因为我没有数据，但至少可以帮助您了解以下内容：

import pandas as pd

dbase1_df = pd.DataFrame.from_csv('dbase1_file.csv',index_col=False)
sgo_df = pd.DataFrame.from_csv('sgo_df.csv',index_col=False)
merged_df = pd.DataFrame.from_csv('merged_df.csv',index_col=False)

#you need to use the same column names for common columns to be able to do the merge operation in pandas , so we changed the column name to mkey

dbase1_df.columns = [u'VALUE', u'COUNT', u'GRID', u'mkey']

#Below operation merges the two dataframes
Step1_Merge = pd.merge(dbase1_df,sgo_df)

#We need to add a new column to concatenate ST and type
Step1_Merge['type_2'] = Step1_Merge['type'].map(lambda x: 'ST'+str(x))

# We need to change the shape of merged_df and move columns to rows to be able to do another merge operation
id = merged_df.ix[:,['GRID']]
a = pd.merge(merged_df.stack(0).reset_index(1), id, left_index=True, right_index=True)

# We also need to change the automatically generated name to type_2 to be able to do the next merge operation
a.columns = [u'type_2', 0, u'GRID']


result = pd.merge(Step1_Merge,a,on=[u'type_2',u'GRID'])

您能为您的列提供数据类型吗？我已经准备好了一些东西，但我想在安装之前对它进行测试。谢谢，除了merged_df中以“ST”开头的列外，所有列都是整数：“ST0”、“ST1”、“ST2”、“ST3”、“ST4”、“ST5”、“ST6”、“ST7”、“ST8”、“ST9”、“ST10”。谢谢，最后一行：result=pd.merge（b、a、on=[u'type_2'，u'GRID'））。第一个参数‘b’是什么？对不起……我刚刚修正了……这是第一个参数的结果merge@user308827这又是怎么回事？（从20分钟开始…）

import pandas as pd

dbase1_df = pd.DataFrame.from_csv('dbase1_file.csv',index_col=False)
sgo_df = pd.DataFrame.from_csv('sgo_df.csv',index_col=False)
merged_df = pd.DataFrame.from_csv('merged_df.csv',index_col=False)

#you need to use the same column names for common columns to be able to do the merge operation in pandas , so we changed the column name to mkey

dbase1_df.columns = [u'VALUE', u'COUNT', u'GRID', u'mkey']

#Below operation merges the two dataframes
Step1_Merge = pd.merge(dbase1_df,sgo_df)

#We need to add a new column to concatenate ST and type
Step1_Merge['type_2'] = Step1_Merge['type'].map(lambda x: 'ST'+str(x))

# We need to change the shape of merged_df and move columns to rows to be able to do another merge operation
id = merged_df.ix[:,['GRID']]
a = pd.merge(merged_df.stack(0).reset_index(1), id, left_index=True, right_index=True)

# We also need to change the automatically generated name to type_2 to be able to do the next merge operation
a.columns = [u'type_2', 0, u'GRID']


result = pd.merge(Step1_Merge,a,on=[u'type_2',u'GRID'])