Python 通过比较值而不是列名来合并两个数据帧_Python_Pandas_Dataframe_Merge

Python 通过比较值而不是列名来合并两个数据帧

python pandas dataframe merge

Python 通过比较值而不是列名来合并两个数据帧,python,pandas,dataframe,merge,Python,Pandas,Dataframe,Merge,数据框1-按日期划分的水果价格（指数为日期）数据框2（排名靠前的水果）（索引为日期）我想要数据框3（给定日期顶部水果的价格）这实际上告诉了我在给定日期顶级水果的价格 date Price_1 Price_2 Price_3 Price_4 ..... Price_100 2020-01-01 9 5 10 4.4 2002-01-02 5 12

数据框1-按日期划分的水果价格（指数为日期）

数据框2（排名靠前的水果）（索引为日期）

我想要数据框3（给定日期顶部水果的价格）这实际上告诉了我在给定日期顶级水果的价格

    date        Price_1    Price_2   Price_3     Price_4 ..... Price_100 
    2020-01-01   9        5          10           4.4
    2002-01-02   5        12         5.4          4
    ...
    2002-12-10   14       20         6.4          10

花了将近一个晚上，尝试迭代数据帧2，然后在数据帧1上进行内部循环，并向数据帧3添加值。我尝试了几乎6-7种不同的方法，分别是iterrow、iteritems，然后通过iloc将输出直接存储到df3。这些都不起作用

我只是想知道有没有更简单的方法。

随后，我将用同一数据帧格式的水果销售额相乘。

用df1生成一个

dict

，然后在df2上使用

replace

：

import pandas as pd

fruits_price = {'Apple': [9,5,14],
            'Orange': [10,12,10],
            'Kiwi': [5,4,20],
            'Watermelon': [4.4,5.4,6.4]}
df1 = pd.DataFrame(fruits_price,
              columns = ['Apple','Orange','Kiwi','Watermelon'],
              index=['2020-01-01','2020-01-02','2020-01-10'])

top_fruits = {'Fruit_1': ['Apple','Apple','Apple'],
          'Fruit_2': ['Kiwi','Orange','Kiwi'],
          'Fruit_3': ['Orange','Watermelon','Watermelon'],
          'Fruit_4': ['Watermelon','Kiwi','Orange']}

df2 = pd.DataFrame(top_fruits, 
               columns = ['Fruit_1','Fruit_2','Fruit_3','Fruit_4'],
               index=['2020-01-01','2020-01-02','2020-01-10'])

result = df2.T.replace(df1.T.to_dict()).T
result.columns = [f"Price_{i}" for i in range(1, len(result.columns)+1)]
result

输出：

            Price_1 Price_2 Price_3 Price_4
2020-01-01  9.0     5.0     10.0    4.4
2020-01-02  5.0     12.0    5.4     4.0
2020-01-10  14.0    20.0    6.4     10.0

只需使用axis=1的

apply

函数，这样做是逐行进行的，每行都是一个系列，其名称是日期，用df1中相应的行替换该值

df2.apply（lambda x:x.replace（df1.to_dict（'index'）[x.name]），axis=1）

（关于

查找（）的W3文章

）。使用其他数据帧而不是

self

请问“使用其他数据帧”是什么意思@诺亚尼斯接近！如果

df2

中有日期而

df1

中没有，则可能会返回

KeyError

。但是它可以通过

df2[df2.index.isin（df1.index）]

轻松处理。美好的杰出的approach@dulq成功了。我将尝试打开您的解决方案并学习它。

import pandas as pd

fruits_price = {'Apple': [9,5,14],
            'Orange': [10,12,10],
            'Kiwi': [5,4,20],
            'Watermelon': [4.4,5.4,6.4]}
df1 = pd.DataFrame(fruits_price,
              columns = ['Apple','Orange','Kiwi','Watermelon'],
              index=['2020-01-01','2020-01-02','2020-01-10'])

top_fruits = {'Fruit_1': ['Apple','Apple','Apple'],
          'Fruit_2': ['Kiwi','Orange','Kiwi'],
          'Fruit_3': ['Orange','Watermelon','Watermelon'],
          'Fruit_4': ['Watermelon','Kiwi','Orange']}

df2 = pd.DataFrame(top_fruits, 
               columns = ['Fruit_1','Fruit_2','Fruit_3','Fruit_4'],
               index=['2020-01-01','2020-01-02','2020-01-10'])

result = df2.T.replace(df1.T.to_dict()).T
result.columns = [f"Price_{i}" for i in range(1, len(result.columns)+1)]
result

            Price_1 Price_2 Price_3 Price_4
2020-01-01  9.0     5.0     10.0    4.4
2020-01-02  5.0     12.0    5.4     4.0
2020-01-10  14.0    20.0    6.4     10.0