Python 计算活人与死人的年龄 编辑:
正如有人建议的那样,我给出了一个可验证的例子。如果您从中取出熊猫,只需放置原始值而不是数据帧值,它就可以完美地工作 如果你把熊猫带回来,就像我下面所说的,程序运行并返回0进行打印(true_age) 现在,我把它设计成嵌套的if语句,但我在某个地方出了问题。如果这个人还活着,一切正常;当我打印年龄时,它会输出正确的年龄。但是,如果此人已死亡,则打印的年龄始终为零。以下是嵌套的if语句以及相关的打印语句:Python 计算活人与死人的年龄 编辑:,python,python-3.x,pandas,Python,Python 3.x,Pandas,正如有人建议的那样,我给出了一个可验证的例子。如果您从中取出熊猫,只需放置原始值而不是数据帧值,它就可以完美地工作 如果你把熊猫带回来,就像我下面所说的,程序运行并返回0进行打印(true_age) 现在,我把它设计成嵌套的if语句,但我在某个地方出了问题。如果这个人还活着,一切正常;当我打印年龄时,它会输出正确的年龄。但是,如果此人已死亡,则打印的年龄始终为零。以下是嵌套的if语句以及相关的打印语句: #Here are the nested if statements: if di
#Here are the nested if statements:
if died_year is None:
if bmonth > now_month:
if bday > now_day:
true_age = age_raw - 1
elif bday < now_day:
true_age = age_raw
elif bmonth < now_month:
true_age = age_raw
elif died_year is not None:
died_year = int(died_year)
died_month = int(died_month)
died_day = int(died_day)
age_raw = died_year - byear
if bmonth > died_month:
if bday > died_day:
true_age = age_raw - 1
elif bday < died_day:
true_age = age_raw
elif bmonth < died_month:
true_age = age_raw
#And now the print statement:
print("DOB: "+str(bmonth)+"/"+str(bday)+"/"+str(byear)+" ("+str(true_age)+" years old)")
注意在满足适当条件之前,我没有将变量died_year、died_month和died_day转换为整数;在if语句之外执行此操作将触发错误,因为null值不能作为int()传递。我觉得我错过了一些非常明显的东西,但也许不是。另外,如果有人有更好的方法来完成这一切,我随时准备学习如何提高效率。将这些值转换为datetime对象要容易得多,然后进行if/elif过滤
import datetime
bmonth = int(storage.iloc[0]['birthMonth'])
bday = int(storage.iloc[0]['birthDay'])
byear = int(storage.iloc[0]['birthYear'])
died_year = storage.iloc[0]['deathYear']
died_month = storage.iloc[0]['deathMonth']
died_day = storage.iloc[0]['deathDay']
start = datetime.datetime(month = bmonth, day=bday, year=byear)
end = datetime.datetime(month=died_month, day=died_day, year=died_year)
(start-end).days#returns the difference between the days
您还可以将datetime.now()和计算在内
希望这会有帮助,它会让你的流程变得更好。
为最小的示例兼容性而编辑
您可以定义一个计算人员年龄的函数:
from datetime import date
def calc_age(row):
bm = row['bornMonth']
bd = row['bornDay']
by = row['bornYear']
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
birth_date = date(*[int(i) for i in (by, bm, bd)]) # suppose that all the parameters is not None
try:
end_date = date(*[int(i) for i in (dy, dm, dd)])
except (TypeError, ValueError): # if death date is None
end_date = date.today()
# is birth date after death date or today; if True == 1, else == 0
is_next_year = ((end_date.month, end_date.day) < (birth_date.month, birth_date.day))
age = end_date.year - birth_date.year - is_next_year
return age
如果没有遗漏的数据,它会返回所有人的pd.系列和年龄。您可以将其连接到数据帧:
df['personsAge'] = df.apply(calc_age, axis=1)
然后添加另一列,并显示状态和打印结果:
def is_dead(row):
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
try:
died = date(*[int(i) for i in (dy, dm, dd)])
return True
except ValueError:
return False
df['is_dead'] = df.apply(is_dead, axis=1)
def print_status(row):
bm = row['bornMonth']
bd = row['bornDay']
by = row['bornYear']
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
age = row['personsAge']
print("DOB: "+str(bm)+"/"+str(bd)+"/"+str(by)+" ("+str(age)+" years old)")
if row['is_dead']:
print("*DECEASED: "+str(dm)+"/"+str(dd)+"/"+str(dy))
df.apply(print_status, axis=1)
stdout:
DOB: 8/17/1932 (47 years old)
*DECEASED: 3/22/1980
DOB: 4/12/1950 (68 years old)
如果您不喜欢复制粘贴日期选择,请将其替换为解决方案中的datetime
方法 熊猫对时间序列有极好的支持,因此利用适当的工具是个好主意。将列转换为单个Datetime列后,我们可以对其执行时间算术:
# demo dataframe
df = pd.DataFrame({
'birthMonth': [5, 2],
'birthDay': [4, 24],
'birthYear': [1924, 1997],
'deathMonth': [3, None],
'deathDay': [1, None],
'deathYear': [2008, None]
})
# convert birth dates to datetimes
birth = pd.to_datetime(df[['birthMonth', 'birthDay', 'birthYear']]
.rename(columns={'birthMonth': 'month', 'birthDay': 'day', 'birthYear': 'year'}))
# convert death dates to datetimes
death = pd.to_datetime(df[['deathMonth', 'deathDay', 'deathYear']]
.rename(columns={'deathMonth':'month', 'deathDay': 'day', 'deathYear': 'year'}))
# calculate age in days, normalizing 'now' to midnight of today
age = (pd.Timestamp.now().normalize() - birth).where(death.isnull(), other=death-birth)
编辑:请参阅下面@ALollz关于时间戳规范化的讨论。您认为您可以生成一个吗?似乎您可以使用DataFrame.loc来实现这一点,并且可以避免所有循环。将日期转换为日期时间
将使减法变得非常简单。我同意ALollz的观点,并补充说,用布尔值表示此人是否已死亡可能比较方便。要进一步说明@Alessi42所说的内容,我建议您检查死亡年份
是否符合您的预期。请尝试打印ingdead\u year
,并确保它是错误的。ValueError:以10为基数的int()的文本无效:“”
,这是预期的。解决方案不错+1.你可以考虑使用<代码> Pd。时间戳。00时间段,因为没有关于死者的信息。这看起来像一个固溶体,我将测试它,看看会发生什么。这非常有效。使用您的解决方案,我使用“age”并向其添加dt.days属性,然后除以365.2422,然后将其作为整数输出。问题解决了。谢谢@很好的建议,我没想到。没有它,上面的代码会在一天中的不同时间产生不同的输出。@adrysdale另一个可以转换为年的方法是age.astype('timedelta64[Y]”)
df.apply(calc_age, axis=1)
df['personsAge'] = df.apply(calc_age, axis=1)
def is_dead(row):
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
try:
died = date(*[int(i) for i in (dy, dm, dd)])
return True
except ValueError:
return False
df['is_dead'] = df.apply(is_dead, axis=1)
def print_status(row):
bm = row['bornMonth']
bd = row['bornDay']
by = row['bornYear']
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
age = row['personsAge']
print("DOB: "+str(bm)+"/"+str(bd)+"/"+str(by)+" ("+str(age)+" years old)")
if row['is_dead']:
print("*DECEASED: "+str(dm)+"/"+str(dd)+"/"+str(dy))
df.apply(print_status, axis=1)
stdout:
DOB: 8/17/1932 (47 years old)
*DECEASED: 3/22/1980
DOB: 4/12/1950 (68 years old)
# demo dataframe
df = pd.DataFrame({
'birthMonth': [5, 2],
'birthDay': [4, 24],
'birthYear': [1924, 1997],
'deathMonth': [3, None],
'deathDay': [1, None],
'deathYear': [2008, None]
})
# convert birth dates to datetimes
birth = pd.to_datetime(df[['birthMonth', 'birthDay', 'birthYear']]
.rename(columns={'birthMonth': 'month', 'birthDay': 'day', 'birthYear': 'year'}))
# convert death dates to datetimes
death = pd.to_datetime(df[['deathMonth', 'deathDay', 'deathYear']]
.rename(columns={'deathMonth':'month', 'deathDay': 'day', 'deathYear': 'year'}))
# calculate age in days, normalizing 'now' to midnight of today
age = (pd.Timestamp.now().normalize() - birth).where(death.isnull(), other=death-birth)