Python 插入数据,然后合并2个数据帧
我从Python和Pandas开始 我有两个CSV,即 CSV1 CSV2 期望结果Python 插入数据,然后合并2个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我从Python和Pandas开始 我有两个CSV,即 CSV1 CSV2 期望结果 Date Col1 Col2 New_Col3 New_Col4 2021-01-01 20 15 Fri 40 2021-01-02 22 12 Sat 55 2021-01-03 30 18 Sun 15 . . 2021-12-31 125 160
Date Col1 Col2 New_Col3 New_Col4
2021-01-01 20 15 Fri 40
2021-01-02 22 12 Sat 55
2021-01-03 30 18 Sun 15
.
.
2021-12-31 125 160 Fri 67
是New_Col3
Date
是CSV2中的单元格,其中New\u Col4
日期在
和Start\u Date
行之间,并且从相应的工作日列开始End\u Date
#将日期列转换为日期时间
df1['Date']=pd.to_datetime(df1['Date'])
df2['Start_Date']=pd.to_datetime(df2['Start_Date'])
df2['End_Date']=pd.to_datetime(df2['End_Date'])
#获取缩写的工作日名称
df1['New_Col3']=df1['Date']。应用(lambda x:x.strftime('%a'))
New_Col4=[]
#迭代df1
对于范围内的i(len(df1)):
#如果df1[“日期”]介于df2[“开始日期”]和df2[“结束日期”]
#根据df1['date']工作日名称获取值
对于范围内的j(len(df2)):
如果df2.loc[j,‘开始日期’]键
构造datetime和interval索引以启用pd.IntervalIndex.get\u索引器(pd.DatetimeIndex)
以实现高效的行匹配。()
对New\u Col4
的df1
的每一行应用来自df2
的值检索功能
通过这种方法,行匹配中可以避免显式的双for循环搜索。但是,仍然需要缓慢的.apply()
。也许有一种巧妙的方法可以将这两个步骤结合起来,但我现在就到此为止
资料
最后一次结束日期
的打字错误已更改
import pandas as pd
import io
df1 = pd.read_csv(io.StringIO("""
Date Col1 Col2
2021-01-01 20 15
2021-01-02 22 12
2021-01-03 30 18
2021-12-31 125 160
"""), sep=r"\s+", engine='python')
df2 = pd.read_csv(io.StringIO("""
Start_Date End_Date Sunday Monday Tuesday Wednesday Thursday Friday Saturday
2021-01-01 2021-02-25 15 25 35 45 30 40 55
2021-02-26 2021-05-31 25 30 44 35 50 45 66
2021-09-01 2022-01-25 44 25 65 54 24 67 38
"""), sep=r"\s+", engine='python')
df1["Date"] = pd.to_datetime(df1["Date"])
df2["Start_Date"] = pd.to_datetime(df2["Start_Date"])
df2["End_Date"] = pd.to_datetime(df2["End_Date"])
解决方案
结果
Date Col1 Col2 New_Col3 New_Col4
2021-01-01 20 15 Fri 40
2021-01-02 22 12 Sat 55
2021-01-03 30 18 Sun 15
.
.
2021-12-31 125 160 Fri 67
import pandas as pd
import io
df1 = pd.read_csv(io.StringIO("""
Date Col1 Col2
2021-01-01 20 15
2021-01-02 22 12
2021-01-03 30 18
2021-12-31 125 160
"""), sep=r"\s+", engine='python')
df2 = pd.read_csv(io.StringIO("""
Start_Date End_Date Sunday Monday Tuesday Wednesday Thursday Friday Saturday
2021-01-01 2021-02-25 15 25 35 45 30 40 55
2021-02-26 2021-05-31 25 30 44 35 50 45 66
2021-09-01 2022-01-25 44 25 65 54 24 67 38
"""), sep=r"\s+", engine='python')
df1["Date"] = pd.to_datetime(df1["Date"])
df2["Start_Date"] = pd.to_datetime(df2["Start_Date"])
df2["End_Date"] = pd.to_datetime(df2["End_Date"])
# 1. Get weekday name
df1["day_name"] = df1["Date"].dt.day_name()
df1["New_Col3"] = df1["day_name"].str[:3]
# 2-1. find corresponding row in df2
df1.set_index("Date", inplace=True)
idx = pd.IntervalIndex.from_arrays(df2["Start_Date"], df2["End_Date"], closed="both")
df1["df2_row"] = idx.get_indexer(df1.index)
# 2-2. pick out the value from df2
def f(row):
"""Get (#row, day_name) in df2"""
return df2[row["day_name"]].iloc[row["df2_row"]]
df1["New_Col4"] = df1.apply(f, axis=1)
print(df1.drop(columns=["day_name", "df2_row"]))
Out[319]:
Col1 Col2 New_Col3 New_Col4
Date
2021-01-01 20 15 Fri 40
2021-01-02 22 12 Sat 55
2021-01-03 30 18 Sun 15
2021-12-31 125 160 Fri 67