Python 基于另一个数据帧的内容向数据帧添加列_Python_Pandas_Dataframe

Python 基于另一个数据帧的内容向数据帧添加列

python pandas dataframe

Python 基于另一个数据帧的内容向数据帧添加列,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个数据帧df和时间，分别表示维护记录和每月时间。我想根据df中的数据在times中添加一列： #df represents car maintenance records data = {"07-18-2012": ["replaced wheels", 45, 200], "09-12-2014": ["changed oil", 30, 40], "09-18-2015": ["fixed dent", 92, 0]} df = pd.DataFrame.from_dict(data,

我有两个数据帧

df

和

时间

，分别表示维护记录和每月时间。我想根据

df

中的数据在

times

中添加一列：

#df represents car maintenance records
data = {"07-18-2012": ["replaced wheels", 45, 200], "09-12-2014": ["changed oil", 30, 40], "09-18-2015": ["fixed dent", 92, 0]}
df = pd.DataFrame.from_dict(data, orient = "index")
df.index = pd.to_datetime(df.index)
df.sort_index(inplace = True)
df.columns = ["description", "mins_spent", "cost"]

#times represents monthly periods
rng = pd.date_range(start = '12/31/2013', end = '1/1/2015', freq='M')
ts = pd.Series(rng)
times = ts.to_frame(name = "months")

我正在尝试添加一个名为

days\u-since\u-maintenance

的新列到

times

，它表示自

df发生最近一次维护以来的天数
我尝试过使用df.ix[]
，迭代for loop
，以及searchsorted（）

df
：
                description  mins_spent  cost
2012-07-18  replaced wheels          45   200
2014-09-12      changed oil          30    40
2015-09-18       fixed dent          92     0

   months
0  2013-12-31
1  2014-01-31
2  2014-02-28
3  2014-03-31
4  2014-04-30
5  2014-05-31
6  2014-06-30
7  2014-07-31
8  2014-08-31
9  2014-09-30
10 2014-10-31
11 2014-11-30
12 2014-12-31

   months       days_since_maintenance
0  2013-12-31   531 days
1  2014-01-31   562 days
2  2014-02-28   ...
3  2014-03-31   ...
4  2014-04-30   ...
5  2014-05-31   ...
6  2014-06-30   ...
7  2014-07-31   ...
8  2014-08-31   774 days
9  2014-09-30   18 days
10 2014-10-31   ...
11 2014-11-30   ...
12 2014-12-31   ...

时代
：
                description  mins_spent  cost
2012-07-18  replaced wheels          45   200
2014-09-12      changed oil          30    40
2015-09-18       fixed dent          92     0

   months
0  2013-12-31
1  2014-01-31
2  2014-02-28
3  2014-03-31
4  2014-04-30
5  2014-05-31
6  2014-06-30
7  2014-07-31
8  2014-08-31
9  2014-09-30
10 2014-10-31
11 2014-11-30
12 2014-12-31

   months       days_since_maintenance
0  2013-12-31   531 days
1  2014-01-31   562 days
2  2014-02-28   ...
3  2014-03-31   ...
4  2014-04-30   ...
5  2014-05-31   ...
6  2014-06-30   ...
7  2014-07-31   ...
8  2014-08-31   774 days
9  2014-09-30   18 days
10 2014-10-31   ...
11 2014-11-30   ...
12 2014-12-31   ...

所需数据帧：
                description  mins_spent  cost
2012-07-18  replaced wheels          45   200
2014-09-12      changed oil          30    40
2015-09-18       fixed dent          92     0

   months
0  2013-12-31
1  2014-01-31
2  2014-02-28
3  2014-03-31
4  2014-04-30
5  2014-05-31
6  2014-06-30
7  2014-07-31
8  2014-08-31
9  2014-09-30
10 2014-10-31
11 2014-11-30
12 2014-12-31

   months       days_since_maintenance
0  2013-12-31   531 days
1  2014-01-31   562 days
2  2014-02-28   ...
3  2014-03-31   ...
4  2014-04-30   ...
5  2014-05-31   ...
6  2014-06-30   ...
7  2014-07-31   ...
8  2014-08-31   774 days
9  2014-09-30   18 days
10 2014-10-31   ...
11 2014-11-30   ...
12 2014-12-31   ...

嗯，这肯定不是最好的解决方案，因为它会循环通过df.index
：
for d in df.index:
    times.ix[times['months'] >= d, 'days_since_maintenance'] = times['months'] - d

In [123]: times
Out[123]:
       months  days_since_maintenance
0  2013-12-31                531 days
1  2014-01-31                562 days
2  2014-02-28                590 days
3  2014-03-31                621 days
4  2014-04-30                651 days
5  2014-05-31                682 days
6  2014-06-30                712 days
7  2014-07-31                743 days
8  2014-08-31                774 days
9  2014-09-30                 18 days
10 2014-10-31                 49 days
11 2014-11-30                 79 days
12 2014-12-31                110 days

df['dates']=df.index
距离最近的定义天数（x，df）：
最近的=df[df['dates']

[13行x 2列]
应用
有效，但利用索引
而不是为日期添加新列
：
def days_since_x(row, df):
    '''returns days between the row date
     and the most recent maintenance date in df'''

    #filter records
    all_maint_prior = df[(df.index <= row)]

    if all_maint_prior.empty:
        return float('NaN')

    else:
        #get last row of filtered results
        most_recent = all_maint_prior.iloc[-1]

        #return difference in dates
        return row-most_recent.name

times["days_since_maintenance"] = times["months"].apply(lambda row: days_since_x (row,df))

def天数（自第x行起）：
''返回行日期之间的天数
以及df“”中的最近维护日期
#过滤记录
所有维护之前=df[（df.index哦，我知道你需要实际查找值。我可以修改。但是apply可能是你的朋友。这是有效的——不确定它是否比MaxU-answer更有效，MaxU-answer也有效。谢谢！这可能取决于日期列表的长度。我的假设是，如果列表较长，apply会更有效。。。。也许其他人知道这是有效的——不确定它是否比同样有效的GMarsh answer效率更高或更低。谢谢！！