R data.table中的有效日期差分

R data.table中的有效日期差分,r,data.table,R,Data.table,我有一张名单,上面有他们的出生日期和死亡日期 我想创建一个数据集,每个人活着的年份都有一个记录 我的代码如下: library(lubridate) library(data.table) deadPerson<-c("Albert Einstein","Erwin Schrodinger","Paul Dirac") dateOfBirth<-c("1879-03-14","1887-08-12","1902-08-08") dateOfDeath<-c("1955-04-1

我有一张名单,上面有他们的出生日期和死亡日期

我想创建一个数据集,每个人活着的年份都有一个记录

我的代码如下:

library(lubridate)
library(data.table)
deadPerson<-c("Albert Einstein","Erwin Schrodinger","Paul Dirac")
dateOfBirth<-c("1879-03-14","1887-08-12","1902-08-08")
dateOfDeath<-c("1955-04-18","1961-01-04","1984-10-20")

df<-data.frame(cbind(deadPerson,dateOfBirth,dateOfDeath))

df$dateOfBirth<-as.POSIXct(df$dateOfBirth)
df$dateOfDeath<-as.POSIXct(df$dateOfDeath)

for(i in 1:dim(df)[1])
{
  birth_day<-df$dateOfBirth[i]
  death_day<-df$dateOfDeath[i]
  numDays<-as.numeric(death_day-birth_day)
  numYears<-floor(numDays/365) # ignore leap years!
  dates <- data.table(index=as.POSIXct(birth_day) + (0:numYears)*years(1))
  dates$Person<-df$deadPerson[i]
  if(i==1){output<-dates}
  else{output<-rbind(output,dates)}
}
output$index<-year(output$index)
colnames(output)<-c("Year.Alive","Person")
库(lubridate)
库(数据表)

deadPerson您可以在
数据表中使用按组的摘要语法,并计算
j
位置处的年份向量,组变量将沿该位置自动广播:

library(data.table)
df[, .(Year.Alive = seq(year(dateOfBirth), year(dateOfDeath))), by = (Person = deadPerson)]

#             Person Year.Alive
# 1: Albert Einstein       1879
# 2: Albert Einstein       1880
# 3: Albert Einstein       1881
# 4: Albert Einstein       1882
# 5: Albert Einstein       1883
# ---                           
# 231:      Paul Dirac       1980
# 232:      Paul Dirac       1981
# 233:      Paul Dirac       1982
# 234:      Paul Dirac       1983
# 235:      Paul Dirac       1984

您可以在
data.table
中按组使用摘要语法,并计算
j
位置处的年份向量,组变量将沿该位置自动广播:

library(data.table)
df[, .(Year.Alive = seq(year(dateOfBirth), year(dateOfDeath))), by = (Person = deadPerson)]

#             Person Year.Alive
# 1: Albert Einstein       1879
# 2: Albert Einstein       1880
# 3: Albert Einstein       1881
# 4: Albert Einstein       1882
# 5: Albert Einstein       1883
# ---                           
# 231:      Paul Dirac       1980
# 232:      Paul Dirac       1981
# 233:      Paul Dirac       1982
# 234:      Paul Dirac       1983
# 235:      Paul Dirac       1984

这是一个
tidyr/dplyr
版本:

library(dplyr)
library(tidyr)

df %>%
  gather(date, event, dateOfBirth, dateOfDeath) %>%
  mutate(year_event = year(event)) %>%
  select(deadPerson, year_event) %>%
  group_by(deadPerson) %>%
  complete(year_event = full_seq(year_event, period = 1L))

这是一个
tidyr/dplyr
版本:

library(dplyr)
library(tidyr)

df %>%
  gather(date, event, dateOfBirth, dateOfDeath) %>%
  mutate(year_event = year(event)) %>%
  select(deadPerson, year_event) %>%
  group_by(deadPerson) %>%
  complete(year_event = full_seq(year_event, period = 1L))

@DavidArenburg打算导入
year
功能。但我想你是对的,它还附带了
数据.table
。@DavidArenburg的意思是导入
年份
函数。但我想你是对的,它还附带了
数据表。