Python 比较和减去日期
我正在寻找一种方法来确定某列中的某个时间是否在同一列中另一个日期的7天内 假设这是我的数据帧-Python 比较和减去日期,python,pandas,dataframe,time,datetime-comparison,Python,Pandas,Dataframe,Time,Datetime Comparison,我正在寻找一种方法来确定某列中的某个时间是否在同一列中另一个日期的7天内 假设这是我的数据帧- dic = {'firstname':['Rick','Rick','Rick','John','John','John','David', 'David','David','Steve','Steve','Steve','Jim','Jim', 'Jim'], 'lastname':['Smith','Sm
dic = {'firstname':['Rick','Rick','Rick','John','John','John','David',
'David','David','Steve','Steve','Steve','Jim','Jim',
'Jim'],
'lastname':['Smith','Smith','Smith','Jones','Jones','Jones',
'Wilson','Wilson','Wilson','Johnson','Johnson',
'Johnson','Miller','Miller','Miller'],
'company':['CFA','CFA','CFA','WND','WND','WND','INO','INO','INO',
'CHP','CHP','CHP','MCD','MCD','MCD'],
'faveday':['2020-03-16','2020-03-11','2020-03-25','2020-04-30',
'2020-05-22','2020-05-03','2020-01-31','2020-01-13',
'2020-01-10','2020-10-22','2020-10-28','2020-10-22',
'2020-10-13','2020-10-28','2020-10-20']}
df = pd.DataFrame(dic)
df['faveday'] = pd.to_datetime(df['faveday'])
print(df)
有输出-
firstname lastname company faveday
0 Rick Smith CFA 2020-03-16
1 Rick Smith CFA 2020-03-11
2 Rick Smith CFA 2020-03-25
3 John Jones WND 2020-04-30
4 John Jones WND 2020-05-22
5 John Jones WND 2020-05-03
6 David Wilson INO 2020-01-31
7 David Wilson INO 2020-01-13
8 David Wilson INO 2020-01-10
9 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
11 Steve Johnson CHP 2020-10-22
12 Jim Miller MCD 2020-10-13
13 Jim Miller MCD 2020-10-28
14 Jim Miller MCD 2020-10-20
然后我用-
df = df.sort_values(['firstname','lastname','company','faveday'])
print(df)
得到-
firstname lastname company faveday
8 David Wilson INO 2020-01-10
7 David Wilson INO 2020-01-13
6 David Wilson INO 2020-01-31
12 Jim Miller MCD 2020-10-13
14 Jim Miller MCD 2020-10-20
13 Jim Miller MCD 2020-10-28
3 John Jones WND 2020-04-30
5 John Jones WND 2020-05-03
4 John Jones WND 2020-05-22
1 Rick Smith CFA 2020-03-11
0 Rick Smith CFA 2020-03-16
2 Rick Smith CFA 2020-03-25
9 Steve Johnson CHP 2020-10-22
11 Steve Johnson CHP 2020-10-22
10 Steve Johnson CHP 2020-10-28
假设我想知道当前的顺序(索引8,然后是7,6,12等等),一个日期是否在另一个日期的7天之内。(因此,索引8和7都将产生true,但索引6不会)
但我也希望按名称对其进行分组。(因此,吉姆·米勒小组的指数12和14是正确的,而吉姆·米勒小组的指数13不是正确的,但史蒂夫·约翰逊小组的指数9、11和10都是正确的)
是否有一种方法可以减去每组中的日期,然后创建一个列,根据它是否在另一天的7天内来判断是真是假
我正在寻找这样的输出-
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 TRUE
7 David Wilson INO 2020-01-13 TRUE
6 David Wilson INO 2020-01-31 FALSE
12 Jim Miller MCD 2020-10-13 TRUE
14 Jim Miller MCD 2020-10-20 TRUE
13 Jim Miller MCD 2020-10-28 FALSE
3 John Jones WND 2020-04-30 TRUE
5 John Jones WND 2020-05-03 TRUE
4 John Jones WND 2020-05-22 FALSE
1 Rick Smith CFA 2020-03-11 TRUE
0 Rick Smith CFA 2020-03-16 TRUE
2 Rick Smith CFA 2020-03-25 FALSE
9 Steve Johnson CHP 2020-10-22 TRUE
11 Steve Johnson CHP 2020-10-22 TRUE
10 Steve Johnson CHP 2020-10-28 TRUE
让我们尝试使用
numpy
broadcast自定义一个函数
def sefd (x):
return np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1, 'D'))<=7,axis=1)>=2
s=df.groupby(['firstname', 'lastname', 'company'])['faveday'].transform(sefd)
Out[301]:
0 True
1 True
2 False
3 True
4 False
5 True
6 False
7 True
8 True
9 True
10 True
11 True
12 True
13 False
14 True
Name: faveday, dtype: bool
df['seven_days']=s
def sefd(x):
返回np.sum((np.abs(x.values-x.values[:,None])/np.timedelta64(1,'D'))=2
s=df.groupby(['firstname','lastname','company'])['faveday'].transform(sefd)
Out[301]:
0对
1正确
2错误
3正确
4错误
5对
6错误
7正确
8正确
9正确
10对
11对
12对
13错误
14对
姓名:faveday,数据类型:bool
df['seven_days']=s
您可以试试这个
from datetime import timedelta
m = (df.groupby(['firstname','lastname']).
apply(lambda x: x['faveday'].sub(x['faveday'].shift()).bfill()).
reset_index(level=[0,1],drop=True))
df['seven_days'] = m.le(timedelta(days=7))
firstname lastname company faveday seven_days
8 David Wilson INO 2020-01-10 True
7 David Wilson INO 2020-01-13 True
6 David Wilson INO 2020-01-31 False
12 Jim Miller MCD 2020-10-13 True
14 Jim Miller MCD 2020-10-20 True
13 Jim Miller MCD 2020-10-28 False
3 John Jones WND 2020-04-30 True
5 John Jones WND 2020-05-03 True
4 John Jones WND 2020-05-22 False
1 Rick Smith CFA 2020-03-11 True
0 Rick Smith CFA 2020-03-16 True
2 Rick Smith CFA 2020-03-25 False
9 Steve Johnson CHP 2020-10-22 True
11 Steve Johnson CHP 2020-10-22 True
10 Steve Johnson CHP 2020-10-28 True
这太棒了,谢谢!现在我想删除新的七天专栏中包含False的索引。最有效的方法是什么?@NicholasDelaTorre df=df[s]