Python 插入数据，然后合并2个数据帧_Python_Pandas_Dataframe

Python 插入数据，然后合并2个数据帧

python pandas dataframe

Python 插入数据，然后合并2个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我从Python和Pandas开始我有两个CSV，即 CSV1 CSV2 期望结果 Date Col1 Col2 New_Col3 New_Col4 2021-01-01 20 15 Fri 40 2021-01-02 22 12 Sat 55 2021-01-03 30 18 Sun 15 . . 2021-12-31 125 160

我从Python和Pandas开始

我有两个CSV，即

CSV1

CSV2

期望结果

Date         Col1   Col2   New_Col3   New_Col4
2021-01-01    20     15       Fri        40
2021-01-02    22     12       Sat        55
2021-01-03    30     18       Sun        15
.
.
2021-12-31    125    160      Fri        67

```
New_Col3
```
是
```
Date
```

New\u Col4

是CSV2中的单元格，其中

日期在Start\u Date
和End\u Date
行之间，并且从相应的工作日列开始

#将日期列转换为日期时间
df1['Date']=pd.to_datetime（df1['Date']）
df2['Start_Date']=pd.to_datetime（df2['Start_Date']）
df2['End_Date']=pd.to_datetime（df2['End_Date']）
#获取缩写的工作日名称
df1['New_Col3']=df1['Date']。应用（lambda x:x.strftime（'%a'））
New_Col4=[]
#迭代df1
对于范围内的i（len（df1））：
#如果df1[“日期”]介于df2[“开始日期”]和df2[“结束日期”]
#根据df1['date']工作日名称获取值
对于范围内的j（len（df2））：
如果df2.loc[j，‘开始日期’]键
构造datetime和interval索引以启用pd.IntervalIndex.get\u索引器（pd.DatetimeIndex）
以实现高效的行匹配。（）
对New\u Col4
的df1
的每一行应用来自df2
的值检索功能
通过这种方法，行匹配中可以避免显式的双for循环搜索。但是，仍然需要缓慢的.apply（）
。也许有一种巧妙的方法可以将这两个步骤结合起来，但我现在就到此为止
资料
最后一次结束日期
的打字错误已更改
import pandas as pd
import io

df1 = pd.read_csv(io.StringIO("""
Date         Col1   Col2
2021-01-01    20     15
2021-01-02    22     12
2021-01-03    30     18
2021-12-31    125    160
"""), sep=r"\s+", engine='python')

df2 = pd.read_csv(io.StringIO("""
Start_Date   End_Date      Sunday  Monday  Tuesday Wednesday Thursday Friday Saturday
2021-01-01   2021-02-25      15      25      35       45       30       40     55
2021-02-26   2021-05-31      25      30      44       35       50       45     66
2021-09-01   2022-01-25       44      25      65       54       24       67     38
"""), sep=r"\s+", engine='python')

df1["Date"] = pd.to_datetime(df1["Date"])
df2["Start_Date"] = pd.to_datetime(df2["Start_Date"])
df2["End_Date"] = pd.to_datetime(df2["End_Date"])

解决方案
结果
Date         Col1   Col2   New_Col3   New_Col4
2021-01-01    20     15       Fri        40
2021-01-02    22     12       Sat        55
2021-01-03    30     18       Sun        15
.
.
2021-12-31    125    160      Fri        67

import pandas as pd
import io

df1 = pd.read_csv(io.StringIO("""
Date         Col1   Col2
2021-01-01    20     15
2021-01-02    22     12
2021-01-03    30     18
2021-12-31    125    160
"""), sep=r"\s+", engine='python')

df2 = pd.read_csv(io.StringIO("""
Start_Date   End_Date      Sunday  Monday  Tuesday Wednesday Thursday Friday Saturday
2021-01-01   2021-02-25      15      25      35       45       30       40     55
2021-02-26   2021-05-31      25      30      44       35       50       45     66
2021-09-01   2022-01-25       44      25      65       54       24       67     38
"""), sep=r"\s+", engine='python')

df1["Date"] = pd.to_datetime(df1["Date"])
df2["Start_Date"] = pd.to_datetime(df2["Start_Date"])
df2["End_Date"] = pd.to_datetime(df2["End_Date"])

# 1. Get weekday name
df1["day_name"] = df1["Date"].dt.day_name()
df1["New_Col3"] = df1["day_name"].str[:3]

# 2-1. find corresponding row in df2
df1.set_index("Date", inplace=True)
idx = pd.IntervalIndex.from_arrays(df2["Start_Date"], df2["End_Date"], closed="both")
df1["df2_row"] = idx.get_indexer(df1.index)

# 2-2. pick out the value from df2
def f(row):
    """Get (#row, day_name) in df2"""
    return df2[row["day_name"]].iloc[row["df2_row"]]

df1["New_Col4"] = df1.apply(f, axis=1)

print(df1.drop(columns=["day_name", "df2_row"]))

Out[319]: 
            Col1  Col2 New_Col3  New_Col4
Date                                     
2021-01-01    20    15      Fri        40
2021-01-02    22    12      Sat        55
2021-01-03    30    18      Sun        15
2021-12-31   125   160      Fri        67