标题名称作为r中的日期
我试图计算用户的“死亡”,这意味着我想确定用户注册一个程序到他们不再在程序中活动之间的持续时间。我有两个文件,我使用标题名称作为r中的日期,r,time-series,R,Time Series,我试图计算用户的“死亡”,这意味着我想确定用户注册一个程序到他们不再在程序中活动之间的持续时间。我有两个文件,我使用read.csv(“filename”,header=TRUE)读取它们: 和文件2: > df2 names X04.16.2013 X04.17.2013 X04.18.2014 X04.19.2013 2001 Allison 5 5 0 0 2002 Andrew
read.csv(“filename”,header=TRUE)
读取它们:
和文件2:
> df2
names X04.16.2013 X04.17.2013 X04.18.2014 X04.19.2013
2001 Allison 5 5 0 0
2002 Andrew 0 0 0 0
2003 Carl 8 8 11 10
2004 Dora 6 4 9 3
2005 Hilary 2 0 0 0
2006 Louis 18 10 8 3
2007 Mary 4 7 7 0
2008 Mickey 9 5 0 0
我想做的是将df2的标题名转换为日期,这样我就可以创建一个新的数据框,其中包含用户名、开始日期和“死亡天数”,即当用户在df2中的条目为0时:
name start.date days.to.death
1 Allison 2013-03-16 33
2 Andrew 2013-03-16 0
3 Carl 2013-03-16 NA
4 Dora 2013-03-17 NA
5 Hilary 2013-03-17 31
6 Louis 2013-03-19 NA
7 Mary 2013-03-20 30
8 Mickey 2013-03-20 28
请注意,安德鲁从未“活着”,卡尔、多拉和路易斯还没有“死”。我对R还是比较陌生,所以非常感谢您的任何意见 格式正确的简单
as.Date
将把列名转换为日期。首先,数据的可复制可复制形式
df<-structure(list(name = structure(1:8, .Label = c("Allison", "Andrew",
"Carl", "Dora", "Hilary", "Louis", "Mary", "Mickey"), class = "factor"),
start.date = structure(c(15780, 15780, 15780, 15781, 15781,
15783, 15784, 15784), class = "Date")), .Names = c("name",
"start.date"), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8"), class = "data.frame")
df2<-structure(list(names = structure(1:8, .Label = c("Allison", "Andrew",
"Carl", "Dora", "Hilary", "Louis", "Mary", "Mickey"), class = "factor"),
X04.16.2013 = c(5L, 0L, 8L, 6L, 2L, 18L, 4L, 9L), X04.17.2013 = c(5L,
0L, 8L, 4L, 0L, 10L, 7L, 5L), X04.18.2014 = c(0L, 0L, 11L,
9L, 0L, 8L, 7L, 0L), X04.19.2013 = c(0L, 0L, 10L, 3L, 0L,
3L, 0L, 0L)), .Names = c("names", "X04.16.2013", "X04.17.2013",
"X04.18.2014", "X04.19.2013"), class = "data.frame", row.names = c("2001",
"2002", "2003", "2004", "2005", "2006", "2007", "2008"))
假设start.date
也是正确的date
类,则将给出以下内容
name start.date days.to.death
1 Allison 2013-03-16 398 days
2 Andrew 2013-03-16 31 days
3 Carl 2013-03-16 NA days
4 Dora 2013-03-17 NA days
5 Hilary 2013-03-17 31 days
6 Louis 2013-03-19 NA days
7 Mary 2013-03-20 30 days
8 Mickey 2013-03-20 394 days
假设df2的列标题中有一个输入错误,下面使用dplyr和tidyr的解决方案可以让您在大多数情况下都做到这一点
library(tidyr)
library(dplyr)
colnames(df)<-c("names", "start") # To join dfs, the first column header needs to be identical to df2
df$start<-as.Date(df$start, format="%m/%d/%Y") #formatting date
给这个
names start date daydiff
1 Hilary 2013-03-17 2013-04-17 31 days
2 Allison 2013-03-16 2013-04-18 33 days
3 Mickey 2013-03-20 2013-04-18 29 days
4 Mary 2013-03-20 2013-04-19 30 days
把NAs和那些从未生活过的人放进去应该很容易。也许这有点帮助?以下简单代码可能有用:
names(df2)[1] = 'name'
merge(df, ddf2)
dfm$days.to.death = ifelse(dfm[,3]==0,0,ifelse(dfm[,4]==0,31, ifelse(dfm[,5]==0,33,ifelse(dfm[,6]==0,28,NA))))
dfm[,c(1,2,7)]
name start.date days.to.death
1 Allison 2013-03-16 33
2 Andrew 2013-03-16 0
3 Carl 2013-03-16 NA
4 Dora 2013-03-17 NA
5 Hilary 2013-03-17 31
6 Louis 2013-03-19 NA
7 Mary 2013-03-20 28
8 Mickey 2013-03-20 33
这是df2列标题中的输入错误吗?他们都应该是2013年吗?我道歉。是的,df2中的所有日期应为2013年
name start.date days.to.death
1 Allison 2013-03-16 398 days
2 Andrew 2013-03-16 31 days
3 Carl 2013-03-16 NA days
4 Dora 2013-03-17 NA days
5 Hilary 2013-03-17 31 days
6 Louis 2013-03-19 NA days
7 Mary 2013-03-20 30 days
8 Mickey 2013-03-20 394 days
library(tidyr)
library(dplyr)
colnames(df)<-c("names", "start") # To join dfs, the first column header needs to be identical to df2
df$start<-as.Date(df$start, format="%m/%d/%Y") #formatting date
df2 %>%
filter(X04.16.2013!=0) %>% #removes Andrew who has 0 in first date col
gather(key,value,2:5) %>%
mutate(date=as.Date(key, format="X%m.%d.%Y")) %>%
left_join(df) %>%
filter(value==0) %>%
group_by(names) %>%
filter(date == nth(date, 1)) %>%
select(names, start, date) %>%
mutate (daydiff=difftime(date,start, unit="days"))
names start date daydiff
1 Hilary 2013-03-17 2013-04-17 31 days
2 Allison 2013-03-16 2013-04-18 33 days
3 Mickey 2013-03-20 2013-04-18 29 days
4 Mary 2013-03-20 2013-04-19 30 days
names(df2)[1] = 'name'
merge(df, ddf2)
dfm$days.to.death = ifelse(dfm[,3]==0,0,ifelse(dfm[,4]==0,31, ifelse(dfm[,5]==0,33,ifelse(dfm[,6]==0,28,NA))))
dfm[,c(1,2,7)]
name start.date days.to.death
1 Allison 2013-03-16 33
2 Andrew 2013-03-16 0
3 Carl 2013-03-16 NA
4 Dora 2013-03-17 NA
5 Hilary 2013-03-17 31
6 Louis 2013-03-19 NA
7 Mary 2013-03-20 28
8 Mickey 2013-03-20 33