Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/79.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/windows/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:如何根据属性值(日期)逐行进行分组排序?_R_Sorting_Date - Fatal编程技术网

R:如何根据属性值(日期)逐行进行分组排序?

R:如何根据属性值(日期)逐行进行分组排序?,r,sorting,date,R,Sorting,Date,我有个问题,希望有人能帮我解决。它基本上是数据操作。我有一个很大的数据集,它由10列、“id”和3组类似的变量“type”、“startdate”和“enddate”组成。下面可以看到一个例子 id type1 startdate1 enddate1 type2 startdate2 enddate2 type3 startdate3 1 1 A 2006-08-20 2006-12-06 W 2006-08-01 2007-08-29 P 2007-08-

我有个问题,希望有人能帮我解决。它基本上是数据操作。我有一个很大的数据集,它由10列、“id”和3组类似的变量“type”、“startdate”和“enddate”组成。下面可以看到一个例子

  id type1 startdate1   enddate1 type2 startdate2   enddate2 type3 startdate3
1  1     A 2006-08-20 2006-12-06     W 2006-08-01 2007-08-29     P 2007-08-18
2  2     A 2006-01-05 2007-07-02    NA         NA         NA     Q 2008-01-15

    enddate3
1 2007-09-27
2 2008-02-07
我希望获得以下已清理和排序的数据集:

  id type1 startdate1   enddate1 type2 startdate2   enddate2 type3 startdate3
1  1     W 2006-08-01 2007-08-29     A 2006-08-20 2006-12-06     P 2007-08-18
2  2     A 2006-01-05 2007-07-02     Q 2008-01-15 2008-02-07    NA         NA 

enddate3
1 2007-09-27
2 NA             
我想按升序排序,每一行/观测值都按照“起始日期”排序。因此,对于第1行,由于第二组或第二组变量的“起始日期”(2006-08-01)早于第一组变量的“起始日期”(2006-08-20),因此我将其放在第一个位置

至于第2行,我想将所有NAs推到最后

关于如何有效地做到这一点,有什么建议吗

我是否应该将“startdate”和“enddate”的数据类型转换为数字?如果我应该,我应该如何处理“NA”

对所有3个集合的(type、startdate、enddate)应用paste()函数是否明智


谢谢你的帮助!提前谢谢你

我们可以使用
plyr
包中的
rbind.fill
。现在,该函数足够智能,可以根据列名进行组合-我们不希望这样。为了将每一行的观测值向前推,我们删除NA,然后将原始数据帧的名称应用于新向量

library(plyr)

df <- data.frame("obs" = seq(3),
                 type1 = c(2,2,NA),date1 = c("date11","date21",NA), 
                 type2 = c(3,NA,5),date2 = c("date12",NA,"date31"),
                 type3 = c(4,3,1), date3 = c("date13","date22","date32"),
                 type4 = c(4,4,NA),date4 = c("date14","date23",NA))
df
#    obs type1  date1 type2  date2 type3  date3 type4  date4
#    1   1     2 date11     3 date12     4 date13     4 date14
#    2   2     2 date21    NA   <NA>     3 date22     4 date23
#    3   3    NA   <NA>     5 date31     1 date32    NA   <NA>

newdf <- sapply(1:nrow(df), function(i){
    newrow <- (df[i,!is.na(df[i,])])              ## Remove NA's
    names(newrow) <- names(df)[1:length(newrow)]  ## Apply names

    newrow                                        ## Output
})

rbind.fill(newdf)
#    obs type1  date1 type2  date2 type3  date3 type4  date4
#    1   1     2 date11     3 date12     4 date13     4 date14
#    2   2     2 date21     3 date22     4 date23    NA   <NA>
#    3   3     5 date31     1 date32    NA   <NA>    NA   <NA>
库(plyr)

df这里有一个使用
dplyr
tidyr
的解决方案,它依赖于将数据集转换为长格式,根据需要重新排序,然后转换回宽格式。转换为长格式会将值强制为
字符
,因此需要重新应用列类型

library(tidyr)
library(dplyr)

df <- read.table(header = TRUE, text = "
id type1 startdate1   enddate1 type2 startdate2   enddate2 type3 startdate3   enddate3
 1     A 2006-08-20 2006-12-06     W 2006-08-01 2007-08-29     P 2007-08-18 2007-09-27
 2     A 2006-01-05 2007-07-02    NA         NA         NA     Q 2008-01-15 2008-02-07
")

df %>%
    gather(key, value, -id) %>%  # convert to long format
    extract(key, c("var", "seq"), "(.*)(\\d)") %>%  # extract sequence number
    spread(var, value) %>%  # spread to wide format by id and sequence
    group_by(id) %>%
    arrange(startdate) %>%  # sort seq by startdate in id groups
    mutate(seq = 1:n()) %>%  # calculate new sequence order
    gather(key, value, -id, -seq) %>%  # convert to long format
    transmute(var = paste0(key, seq), value) %>%  # generate wide format names
    spread(var, value) %>%  # spread to back to wide format
    select(one_of(names(df))) %>%  # restore original column order
    mutate_each("as.Date", one_of(grep("date", names(df), value = TRUE)))
        # reapply date type to original date variables

#     Source: local data frame [2 x 10]
#     Groups: id [2]
#     
#          id type1 startdate1   enddate1 type2 startdate2   enddate2 type3 startdate3   enddate3
#       (int) (chr)     (date)     (date) (chr)     (date)     (date) (chr)     (date)     (date)
#     1     1     W 2006-08-01 2007-08-29     A 2006-08-20 2006-12-06     P 2007-08-18 2007-09-27
#     2     2     A 2006-01-05 2007-07-02     Q 2008-01-15 2008-02-07    NA       <NA>       <NA>
library(tidyr)
图书馆(dplyr)
df%
聚集(键,值,-id)%>%#转换为长格式
提取(键,c(“var”,“seq”),“(.*)(\\d)”)%>%#提取序列号
排列(变量,值)%>%#按id和顺序排列为宽格式
分组依据(id)%>%
排列(startdate)%>%#按id组中的startdate对序列进行排序
变异(seq=1:n())%>%#计算新的序列顺序
聚集(键,值,-id,-seq)%>%#转换为长格式
转换(var=paste0(键,序列),值)%>%#生成宽格式名称
排列(变量,值)%>%#排列到宽格式
选择(名称(df))中的一个%>%#恢复原始列顺序
每一个(as.Date),一个(grep(“Date”,names(df),value=TRUE))进行变异
#将日期类型重新应用于原始日期变量
#来源:本地数据帧[2 x 10]
#组别:id[2]
#     
#id类型1开始日期1结束日期1类型2开始日期2结束日期2类型3开始日期3结束日期3
#(内部)(chr)(日期)(日期)(chr)(日期)(日期)(chr)(日期)(日期)(日期)(日期)
#更新:2007-08-012007-08-29A 2006-08-202006-12-06P 2007-08-18207-09-27
#2 A 2006-01-05 2007-07-02 Q 2008-01-15 2008-02-07 NA

方法与Mikko Marttila相同,但不使用非标准库:

> ## use vectors of class Date
> df[c(3,4,6,7,9,10)] <- lapply(df[c(3,4,6,7,9,10)], as.Date)

> ## reshape to long format
> df.1 <- reshape(df, idvar=1,
+                 varying=list(c(2,5,8), c(3,6,9), c(4,7,10)),
+                 v.names=c('type', 'startdate', 'enddate'),
+                 times=c(1,2,3), timevar='group', direction='long')
> df.1
#     id group type  startdate    enddate
# 1.1  1     1    A 2006-08-20 2006-12-06
# 2.1  2     1    A 2006-01-05 2007-07-02
# 1.2  1     2    W 2006-08-01 2007-08-29
# 2.2  2     2 <NA>       <NA>       <NA>
# 1.3  1     3    P 2007-08-18 2007-09-27
# 2.3  2     3    Q 2008-01-15 2008-02-07

> ## reset group variable according to startdate
> df.1$group <- with(df.1, unsplit(lapply(split(startdate, id), order), id))
> df.1
#     id group type  startdate    enddate
# 1.1  1     2    A 2006-08-20 2006-12-06
# 2.1  2     1    A 2006-01-05 2007-07-02
# 1.2  1     1    W 2006-08-01 2007-08-29
# 2.2  2     3 <NA>       <NA>       <NA>
# 1.3  1     3    P 2007-08-18 2007-09-27
# 2.3  2     2    Q 2008-01-15 2008-02-07

> ## back to wide format
> df.2 <- reshape(df.1[order(df.1$group), ], idvar=1,
+                 v.names=c('type', 'startdate', 'enddate'), timevar='group',
+                 direction='wide')

> ## sort by id
> df.2 <- df.2[order(df.2$id), ]

> df.2
#     id type.1 startdate.1  enddate.1 type.2 startdate.2  enddate.2 type.3
# 1.2  1      W  2006-08-01 2007-08-29      A  2006-08-20 2006-12-06      P
# 2.1  2      A  2006-01-05 2007-07-02      Q  2008-01-15 2008-02-07   <NA>
#     startdate.3  enddate.3
# 1.2  2007-08-18 2007-09-27
# 2.1        <NA>       <NA>
##使用类日期向量
>df[c(3,4,6,7,9,10)]##重塑为长格式
>df.1 df.1
#id组类型startdate enddate
#1.1 A 2006-08-20 2006-12-06
#2.1 A 2006-01-05 2007-07-02
#1.2 1 2 W 2006-08-01 2007-08-29
# 2.2  2     2               
#1.3 1P 2007-08-18 2007-09-27
#2.3 2 3 Q 2008-01-15 2008-02-07
>##根据起始日期重置组变量
>df.1$集团df.1
#id组类型startdate enddate
#1.1.2 A 2006-08-20 2006-12-06
#2.1 A 2006-01-05 2007-07-02
#1.2 1 W 2006-08-01 2007-08-29
# 2.2  2     3               
#1.3 1P 2007-08-18 2007-09-27
#2.3 2 Q 2008-01-15 2008-02-07
>##返回宽格式
>df.2##按id排序
>df.2 df.2
#id类型.1 startdate.1 enddate.1 type.2 startdate.2 enddate.2 type.3
#1.2W 2006-08-012007-08-29A 2006-08-202006-12-06P
#2.1 2 A 2006-01-05 2007-07-02 Q 2008-01-15 2008-02-07
#开始日期3结束日期3
# 1.2  2007-08-18 2007-09-27
# 2.1               

我刚刚看到您希望推送取决于日期。我认为基本上你在问两个问题——1)如何推动和2):如何排序。我只回答了第一个问题,非常感谢!这似乎非常有用,因为我的数据集非常稀疏,我真的需要将NAs推到右侧。