Python 插入数据,然后合并2个数据帧

Python 插入数据,然后合并2个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我从Python和Pandas开始 我有两个CSV,即 CSV1 CSV2 期望结果 Date Col1 Col2 New_Col3 New_Col4 2021-01-01 20 15 Fri 40 2021-01-02 22 12 Sat 55 2021-01-03 30 18 Sun 15 . . 2021-12-31 125 160

我从Python和Pandas开始

我有两个CSV,即

CSV1

CSV2

期望结果

Date         Col1   Col2   New_Col3   New_Col4
2021-01-01    20     15       Fri        40
2021-01-02    22     12       Sat        55
2021-01-03    30     18       Sun        15
.
.
2021-12-31    125    160      Fri        67
  • New_Col3
    Date
  • New\u Col4
    是CSV2中的单元格,其中
    日期在
    Start\u Date
    End\u Date
    行之间,并且从相应的工作日列开始
#将日期列转换为日期时间
df1['Date']=pd.to_datetime(df1['Date'])
df2['Start_Date']=pd.to_datetime(df2['Start_Date'])
df2['End_Date']=pd.to_datetime(df2['End_Date'])
#获取缩写的工作日名称
df1['New_Col3']=df1['Date']。应用(lambda x:x.strftime('%a'))
New_Col4=[]
#迭代df1
对于范围内的i(len(df1)):
#如果df1[“日期”]介于df2[“开始日期”]和df2[“结束日期”]
#根据df1['date']工作日名称获取值
对于范围内的j(len(df2)):
如果df2.loc[j,‘开始日期’]键
  • 构造datetime和interval索引以启用
    pd.IntervalIndex.get\u索引器(pd.DatetimeIndex)
    以实现高效的行匹配。()
  • New\u Col4
    df1
    的每一行应用来自
    df2
    的值检索功能
  • 通过这种方法,行匹配中可以避免显式的双for循环搜索。但是,仍然需要缓慢的
    .apply()
    。也许有一种巧妙的方法可以将这两个步骤结合起来,但我现在就到此为止

    资料 最后一次
    结束日期
    的打字错误已更改

    import pandas as pd
    import io
    
    df1 = pd.read_csv(io.StringIO("""
    Date         Col1   Col2
    2021-01-01    20     15
    2021-01-02    22     12
    2021-01-03    30     18
    2021-12-31    125    160
    """), sep=r"\s+", engine='python')
    
    df2 = pd.read_csv(io.StringIO("""
    Start_Date   End_Date      Sunday  Monday  Tuesday Wednesday Thursday Friday Saturday
    2021-01-01   2021-02-25      15      25      35       45       30       40     55
    2021-02-26   2021-05-31      25      30      44       35       50       45     66
    2021-09-01   2022-01-25       44      25      65       54       24       67     38
    """), sep=r"\s+", engine='python')
    
    df1["Date"] = pd.to_datetime(df1["Date"])
    df2["Start_Date"] = pd.to_datetime(df2["Start_Date"])
    df2["End_Date"] = pd.to_datetime(df2["End_Date"])
    
    解决方案 结果
    Date         Col1   Col2   New_Col3   New_Col4
    2021-01-01    20     15       Fri        40
    2021-01-02    22     12       Sat        55
    2021-01-03    30     18       Sun        15
    .
    .
    2021-12-31    125    160      Fri        67
    
    import pandas as pd
    import io
    
    df1 = pd.read_csv(io.StringIO("""
    Date         Col1   Col2
    2021-01-01    20     15
    2021-01-02    22     12
    2021-01-03    30     18
    2021-12-31    125    160
    """), sep=r"\s+", engine='python')
    
    df2 = pd.read_csv(io.StringIO("""
    Start_Date   End_Date      Sunday  Monday  Tuesday Wednesday Thursday Friday Saturday
    2021-01-01   2021-02-25      15      25      35       45       30       40     55
    2021-02-26   2021-05-31      25      30      44       35       50       45     66
    2021-09-01   2022-01-25       44      25      65       54       24       67     38
    """), sep=r"\s+", engine='python')
    
    df1["Date"] = pd.to_datetime(df1["Date"])
    df2["Start_Date"] = pd.to_datetime(df2["Start_Date"])
    df2["End_Date"] = pd.to_datetime(df2["End_Date"])
    
    # 1. Get weekday name
    df1["day_name"] = df1["Date"].dt.day_name()
    df1["New_Col3"] = df1["day_name"].str[:3]
    
    # 2-1. find corresponding row in df2
    df1.set_index("Date", inplace=True)
    idx = pd.IntervalIndex.from_arrays(df2["Start_Date"], df2["End_Date"], closed="both")
    df1["df2_row"] = idx.get_indexer(df1.index)
    
    # 2-2. pick out the value from df2
    def f(row):
        """Get (#row, day_name) in df2"""
        return df2[row["day_name"]].iloc[row["df2_row"]]
    
    df1["New_Col4"] = df1.apply(f, axis=1)
    
    print(df1.drop(columns=["day_name", "df2_row"]))
    
    Out[319]: 
                Col1  Col2 New_Col3  New_Col4
    Date                                     
    2021-01-01    20    15      Fri        40
    2021-01-02    22    12      Sat        55
    2021-01-03    30    18      Sun        15
    2021-12-31   125   160      Fri        67