Python 将列从一个数据帧映射到另一个数据帧以创建新列_Python_Pandas_Dataframe_Mapping

Python 将列从一个数据帧映射到另一个数据帧以创建新列

python pandas dataframe mapping

Python 将列从一个数据帧映射到另一个数据帧以创建新列,python,pandas,dataframe,mapping,Python,Pandas,Dataframe,Mapping,我有一个数据帧 id store address 1 100 xyz 2 200 qwe 3 300 asd 4 400 zxc 5 500 bnm 我有另一个数据帧df2 serialNo store_code warehouse 1 300 Land 2 500 Sea 3

我有一个数据帧

id  store    address
1    100        xyz
2    200        qwe
3    300        asd
4    400        zxc
5    500        bnm

我有另一个数据帧df2

serialNo    store_code  warehouse
    1          300         Land
    2          500         Sea
    3          100         Land
    4          200         Sea
    5          400         Land

我希望我的最终数据帧看起来像：

id  store    address  warehouse
1    100        xyz     Land
2    200        qwe     Sea
3    300        asd     Land
4    400        zxc     Land
5    500        bnm     Sea

i、 e从一个数据帧映射到另一个数据帧，创建新列

使用或：

df.merge

pd.concat

df.sort\u值

假设您的数据帧已经在

存储

上排序，则第一个排序调用是多余的，在这种情况下，您可以将其删除

df.replace

df.map

或者，显式创建映射。如果您以后想使用它，这是有效的

mapping = dict(df2[['store_code', 'warehouse']].values)
df1['warehouse'] = df1.store.map(mapping)
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

在类似的数据集中运行.map代码时，我遇到了这个错误<代码>重新索引仅对唯一值的索引对象有效我认为在

store\u code

中存在重复的问题。所以需要

df1['store'].map（df2.drop_duplicates（'store_code'）。set_index（'store_code'）['warehouse'））

正确！谢谢：）地图如何处理大量数据，例如数据帧5到1000万？我想知道dict是否能有效地工作。这取决于数据，但熊猫通常在这样的数据规模下工作得不好。在分布式处理（如dask）方面想得更多。哪一个最快？@Pablo这取决于你的数据，最好是用

%timeit

语句测试它

df1 = df1.join(df2.set_index('store_code'), on=['store']).drop('serialNo', 1)
print (df1)
   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

out = (df1.merge(df2, left_on='store', right_on='store_code')
          .reindex(columns=['id', 'store', 'address', 'warehouse']))
print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

u = df1.sort_values('store')
v = df2.sort_values('store_code')[['warehouse']].reset_index(drop=1)
out = pd.concat([u, v], 1)

print(out)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

s = df1.store.replace(df2.set_index('store_code')['warehouse'])
print(s) 
0    Land
1     Sea
2    Land
3    Land
4     Sea

df1['warehouse'] = s
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea

mapping = dict(df2[['store_code', 'warehouse']].values)
df1['warehouse'] = df1.store.map(mapping)
print(df1)

   id  store address warehouse
0   1    100     xyz      Land
1   2    200     qwe       Sea
2   3    300     asd      Land
3   4    400     zxc      Land
4   5    500     bnm       Sea