在python中使用pandas加速类似vlookup的操作
我已经编写了一些代码,基本上是在两个熊猫数据帧上执行excel风格的vlookup,并希望加快速度 数据帧的结构如下所示: dbase1_df.列:在python中使用pandas加速类似vlookup的操作,python,performance,pandas,vlookup,Python,Performance,Pandas,Vlookup,我已经编写了一些代码,基本上是在两个熊猫数据帧上执行excel风格的vlookup,并希望加快速度 数据帧的结构如下所示: dbase1_df.列: “值”、“计数”、“网格”、“SGO10GEO” 合并的列: ‘网格’、‘ST0’、‘ST1’、‘ST2’、‘ST3’、‘ST4’、‘ST5’、‘ST6’、‘ST7’、‘ST8’、‘ST9’、‘ST10’ sgo_df.列: “mkey”,“type” 要组合它们,我需要执行以下操作: 1.对于dbase1_df中的每一行,查找其“SGO10GEO
“值”、“计数”、“网格”、“SGO10GEO” 合并的列:
‘网格’、‘ST0’、‘ST1’、‘ST2’、‘ST3’、‘ST4’、‘ST5’、‘ST6’、‘ST7’、‘ST8’、‘ST9’、‘ST10’ sgo_df.列:
“mkey”,“type” 要组合它们,我需要执行以下操作:
1.对于dbase1_df中的每一行,查找其“SGO10GEO”值与sgo_df的“mkey”值匹配的行。从sgo_df中的该行获取“类型”
合并的\u df=pandas.DataFrame.from\u csv('merged.csv',index\u col=False) lup_out.writerow([“VALUE”,“TYPE”,EXTRACT_VAR.upper())
//对于dbase1数据帧中的每个唯一值:
对于索引,dbase1_df.iterrows()中的行: 这里有什么可能是速度瓶颈的地方吗?目前,在dbase1_df中,大约500000行需要20分钟~合并的__df中有1000行,sgo_df中有约500000行
谢谢 您需要在Pandas中使用合并操作以获得更好的性能。我无法测试以下代码,因为我没有数据,但至少可以帮助您了解以下内容:
import pandas as pd
dbase1_df = pd.DataFrame.from_csv('dbase1_file.csv',index_col=False)
sgo_df = pd.DataFrame.from_csv('sgo_df.csv',index_col=False)
merged_df = pd.DataFrame.from_csv('merged_df.csv',index_col=False)
#you need to use the same column names for common columns to be able to do the merge operation in pandas , so we changed the column name to mkey
dbase1_df.columns = [u'VALUE', u'COUNT', u'GRID', u'mkey']
#Below operation merges the two dataframes
Step1_Merge = pd.merge(dbase1_df,sgo_df)
#We need to add a new column to concatenate ST and type
Step1_Merge['type_2'] = Step1_Merge['type'].map(lambda x: 'ST'+str(x))
# We need to change the shape of merged_df and move columns to rows to be able to do another merge operation
id = merged_df.ix[:,['GRID']]
a = pd.merge(merged_df.stack(0).reset_index(1), id, left_index=True, right_index=True)
# We also need to change the automatically generated name to type_2 to be able to do the next merge operation
a.columns = [u'type_2', 0, u'GRID']
result = pd.merge(Step1_Merge,a,on=[u'type_2',u'GRID'])
您能为您的列提供数据类型吗?我已经准备好了一些东西,但我想在安装之前对它进行测试。谢谢,除了merged_df中以“ST”开头的列外,所有列都是整数:“ST0”、“ST1”、“ST2”、“ST3”、“ST4”、“ST5”、“ST6”、“ST7”、“ST8”、“ST9”、“ST10”。谢谢,最后一行:result=pd.merge(b、a、on=[u'type_2',u'GRID'))。第一个参数‘b’是什么?对不起……我刚刚修正了……这是第一个参数的结果merge@user308827这又是怎么回事?(从20分钟开始…)
import pandas as pd
dbase1_df = pd.DataFrame.from_csv('dbase1_file.csv',index_col=False)
sgo_df = pd.DataFrame.from_csv('sgo_df.csv',index_col=False)
merged_df = pd.DataFrame.from_csv('merged_df.csv',index_col=False)
#you need to use the same column names for common columns to be able to do the merge operation in pandas , so we changed the column name to mkey
dbase1_df.columns = [u'VALUE', u'COUNT', u'GRID', u'mkey']
#Below operation merges the two dataframes
Step1_Merge = pd.merge(dbase1_df,sgo_df)
#We need to add a new column to concatenate ST and type
Step1_Merge['type_2'] = Step1_Merge['type'].map(lambda x: 'ST'+str(x))
# We need to change the shape of merged_df and move columns to rows to be able to do another merge operation
id = merged_df.ix[:,['GRID']]
a = pd.merge(merged_df.stack(0).reset_index(1), id, left_index=True, right_index=True)
# We also need to change the automatically generated name to type_2 to be able to do the next merge operation
a.columns = [u'type_2', 0, u'GRID']
result = pd.merge(Step1_Merge,a,on=[u'type_2',u'GRID'])