Python 3.x 为数据帧生成值时在函数中使用条件
我必须使用一个随机生成日期值的函数来创建一个数据框,该数据框包含开始日期和结束日期列,其中Python 3.x 为数据帧生成值时在函数中使用条件,python-3.x,pandas,numpy,Python 3.x,Pandas,Numpy,我必须使用一个随机生成日期值的函数来创建一个数据框,该数据框包含开始日期和结束日期列,其中end\u date>start\u date 我试过这样的方法: Project = pd.DataFrame({'Name': np.random.choice(['Starbucks','Macdonalds', 'KFC', 'Maruti', 'Honda','Mercedes', 'BMW',
end\u date>start\u date
我试过这样的方法:
Project = pd.DataFrame({'Name': np.random.choice(['Starbucks','Macdonalds', 'KFC', 'Maruti',
'Honda','Mercedes', 'BMW', 'Reebok','Nike','Lee'],10),
'Start_Date':Project.apply(lambda row: gen_datetime(), axis = 1),
'End_Date': Project.apply(lambda row: gen_datetime() where('End_Date' > 'Start_Date' ), axis = 1)})
我不知道如何使用条件语句:
def gen_datetime(min_year=2017, max_year=datetime.now().year):
start = date(min_year, 10, 28)
years = max_year - min_year + 1
end = start + timedelta(days=365 * years)
for i in range(10):
random_date = start + (end - start) * random.random()
return random_date
想法是通过添加随机
时间增量
,从开始时间
生成随机结束时间
:
N = 10
shift_end_date = 20
def gen_datetime(min_year=2017, max_year=datetime.now().year):
start = date(min_year, 10, 28)
years = max_year - min_year + 1
end = start + timedelta(days=365 * years)
dates = pd.date_range(start, end - timedelta(shift_end_date))
return np.random.choice(dates, N)
names = ['Starbucks','Macdonalds', 'KFC', 'Maruti',
'Honda','Mercedes', 'BMW', 'Reebok','Nike','Lee']
Project = pd.DataFrame({'Name': np.random.choice(names,N),
'Start_Date':gen_datetime()})
days = pd.to_timedelta(np.random.randint(1, shift_end_date, size=N), unit='d')
Project['End_Date'] = Project['Start_Date'] + days
print(Project)
Name Start_Date End_Date
0 Maruti 2018-07-31 2018-08-13
1 KFC 2017-11-20 2017-11-21
2 Maruti 2018-07-22 2018-07-23
3 Reebok 2018-05-13 2018-05-15
4 KFC 2018-08-16 2018-08-29
5 Starbucks 2018-03-18 2018-03-23
6 Reebok 2018-02-13 2018-03-03
7 Lee 2018-04-26 2018-05-10
8 Reebok 2018-09-11 2018-09-15
9 Honda 2018-05-15 2018-05-19
改进的解决方案-函数返回开始日和结束日的数组,并在中使用参数origin
in,需要:
确切的是什么
gen\u datatime()
,请发布完整的代码。@JoeIddon。。张贴在主姿态代码是完美的。只需将开始日期转换为日期时间,然后再添加天数。
N = 10
def gen_datetime(min_year=2017, max_year=datetime.now().year):
start = pd.Timestamp(min_year, 10, 28)
years = max_year - min_year + 1
end = 365 * years
#get random sorted 2d array for days from start date
d = np.sort(np.random.randint(end, size=[2,N]), axis=0)
#convert to datetime with origin parameter
a = pd.to_datetime(d[0], unit='D',
origin=start)
b = pd.to_datetime(d[1], unit='D',
origin=start)
#return both arrays together
return a,b
#extract output to 2 variables
start, end = gen_datetime()
names = ['Starbucks','Macdonalds', 'KFC', 'Maruti',
'Honda','Mercedes', 'BMW', 'Reebok','Nike','Lee']
Project = pd.DataFrame({'Name': np.random.choice(names,N),
'Start_Date':start,
'End_Date':end}, columns=['Name','Start_Date','End_Date'])
print(Project)
Name Start_Date End_Date
0 Reebok 2017-11-20 2018-06-28
1 Nike 2018-06-12 2018-07-23
2 Reebok 2018-04-26 2018-07-06
3 BMW 2018-02-20 2018-07-14
4 Starbucks 2018-04-02 2018-09-10
5 Starbucks 2017-12-14 2018-03-29
6 Lee 2018-05-17 2018-09-13
7 Macdonalds 2017-11-01 2018-08-20
8 Reebok 2018-04-09 2018-06-27
9 Macdonalds 2018-02-21 2018-10-07