Python 基于参考表的熊猫集值标准化方法_Python_Pandas

Python 基于参考表的熊猫集值标准化方法

python pandas

Python 基于参考表的熊猫集值标准化方法,python,pandas,Python,Pandas,我有两个数据帧，一个参考表和一个主表。我想将引用表中的值映射到主表，必要时进行覆盖。以视觉形式：这似乎是一个非常常见的用例。我发现下面的问题似乎非常适合，但在某种意义上似乎有点“黑客”。只是想知道有没有合适的方法谢谢我们通常使用np.where s=reference_table.set_index('Fruit').Price.reindex(main_data.Fruit).values main_data['Price']=np.where(np.isnan(s),main_d

我有两个数据帧，一个参考表和一个主表。我想将引用表中的值映射到主表，必要时进行覆盖。以视觉形式：

这似乎是一个非常常见的用例。我发现下面的问题似乎非常适合，但在某种意义上似乎有点“黑客”。只是想知道有没有合适的方法

谢谢

我们通常使用

np.where

s=reference_table.set_index('Fruit').Price.reindex(main_data.Fruit).values
main_data['Price']=np.where(np.isnan(s),main_data['Price'],s)

您还可以合并和分配，然后删除未使用的列

main_data=main_data.merge（参考_表，on='Fruit'，how='left'）。赋值（Price=lambda x:x['Price'y']）。fillna（x['Price'x']）。drop（['Price'x'，'Price'y']，axis=1）

结果

       Fruit  col1  col2  Price
0     Durian     1     5   40.0
1  Pineapple     2     5  120.0
2      Apple     3     5   50.0
3     Orange     4     5   70.0
4       Pear     5     5   60.0

第二位用水果名称覆盖价格列，它是否应该是

。最后的主数据['price'，s）

？当我在实际数据集上使用它时，我得到

值错误：无法从重复轴重新编制索引

。我认为第一行不知何故创建了重复的索引。。？

       Fruit  col1  col2  Price
0     Durian     1     5   40.0
1  Pineapple     2     5  120.0
2      Apple     3     5   50.0
3     Orange     4     5   70.0
4       Pear     5     5   60.0