R 扩展由“定义的范围”;从「;及;至;柱
我有一个数据框,其中包含美国总统的姓名,他们任职的开始和结束时间(R 扩展由“定义的范围”;从「;及;至;柱,r,dataframe,R,Dataframe,我有一个数据框,其中包含美国总统的姓名,他们任职的开始和结束时间(“from”和“to”列)。以下是一个示例: name from to Bill Clinton 1993 2001 George W. Bush 2001 2009 Barack Obama 2009 2012 …以及来自dput的输出: dput(tail(presidents, 3)) structure(list(name = c("Bill Clinton", "George W. Bus
“from”
和“to”
列)。以下是一个示例:
name from to
Bill Clinton 1993 2001
George W. Bush 2001 2009
Barack Obama 2009 2012
…以及来自dput的输出:
dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
我想用两列(“name”
和“year”
)创建一个数据框,其中一行表示总统任职的每一年。因此,我需要创建一个从“from
”到“”再到“
”的定期序列。以下是我的预期结果:
name year
Bill Clinton 1993
Bill Clinton 1994
...
Bill Clinton 2000
Bill Clinton 2001
George W. Bush 2001
George W. Bush 2002
...
George W. Bush 2008
George W. Bush 2009
Barack Obama 2009
Barack Obama 2010
Barack Obama 2011
Barack Obama 2012
我知道我可以使用data.frame(name=“Bill Clinton”,year=seq(1993,2001))
为一位总统扩展内容,但我不知道如何为每位总统迭代
我该怎么做?我觉得我应该知道这一点,但我还是感到茫然
更新1
好的,我已经尝试了两种解决方案,但我得到了一个错误:
foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1
foo您可以使用plyr
软件包:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
# name year
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# [...]
如果按年份对数据进行排序很重要,您可以使用arrange
功能:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# 3 Bill Clinton 1995
# [...]
# 21 Barack Obama 2011
# 22 Barack Obama 2012
您可以使用plyr
软件包:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
# name year
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# [...]
如果按年份对数据进行排序很重要,您可以使用arrange
功能:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# 3 Bill Clinton 1995
# [...]
# 21 Barack Obama 2011
# 22 Barack Obama 2012
这是一个数据表
解决方案。它有一个很好的特点(如果不重要的话),就是让总统按他们提供的顺序行事:
library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
# name year
# 1: Bill Clinton 1993
# 2: Bill Clinton 1994
# ...
# ...
# 21: Barack Obama 2011
# 22: Barack Obama 2012
这是一个数据表
解决方案。它有一个很好的特点(如果不重要的话),就是让总统按他们提供的顺序行事:
library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
# name year
# 1: Bill Clinton 1993
# 2: Bill Clinton 1994
# ...
# ...
# 21: Barack Obama 2011
# 22: Barack Obama 2012
这是一个快速的基础-R
解决方案,其中Df
是您的数据。frame
:
do.call(rbind, apply(Df, 1, function(x) {
data.frame(name=x[1], year=seq(x[2], x[3]))}))
它给出了一些关于行名称的警告,但似乎返回了正确的数据。frame
这是一个快速的base-R
解决方案,其中Df
是您的数据。frame
:
do.call(rbind, apply(Df, 1, function(x) {
data.frame(name=x[1], year=seq(x[2], x[3]))}))
它给出了一些关于行名称的警告,但似乎返回了正确的数据。frame
以下是dplyr
解决方案:
library(dplyr)
# the data
presidents <-
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
# the expansion of the table
presidents %>%
rowwise() %>%
do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
# the output
Source: local data frame [22 x 2]
Groups: <by row>
name year
(chr) (dbl)
1 Bill Clinton 1993
2 Bill Clinton 1994
3 Bill Clinton 1995
4 Bill Clinton 1996
5 Bill Clinton 1997
6 Bill Clinton 1998
7 Bill Clinton 1999
8 Bill Clinton 2000
9 Bill Clinton 2001
10 George W. Bush 2001
.. ... ...
l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# ...snip
# 8 Bill Clinton 2000
# 9 Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19 Barack Obama 2009
# 20 Barack Obama 2010
# 21 Barack Obama 2011
# 22 Barack Obama 2012
库(dplyr)
#数据
总裁%
行()
do(data.frame(name=.$name,year=seq(.$from,.$to,by=1)))
#输出
来源:本地数据帧[22 x 2]
组:
命名年份
(chr)(dbl)
1比尔·克林顿1993
2比尔·克林顿1994
3比尔·克林顿1995
4比尔·克林顿1996
5比尔·克林顿1997
6比尔·克林顿1998
7比尔·克林顿1999
8比尔·克林顿2000
9比尔·克林顿2001
10乔治·W·布什2001
.. ... ...
h/t:这里有一个dplyr
解决方案:
library(dplyr)
# the data
presidents <-
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
# the expansion of the table
presidents %>%
rowwise() %>%
do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
# the output
Source: local data frame [22 x 2]
Groups: <by row>
name year
(chr) (dbl)
1 Bill Clinton 1993
2 Bill Clinton 1994
3 Bill Clinton 1995
4 Bill Clinton 1996
5 Bill Clinton 1997
6 Bill Clinton 1998
7 Bill Clinton 1999
8 Bill Clinton 2000
9 Bill Clinton 2001
10 George W. Bush 2001
.. ... ...
l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# ...snip
# 8 Bill Clinton 2000
# 9 Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19 Barack Obama 2009
# 20 Barack Obama 2010
# 21 Barack Obama 2011
# 22 Barack Obama 2012
库(dplyr)
#数据
总裁%
行()
do(data.frame(name=.$name,year=seq(.$from,.$to,by=1)))
#输出
来源:本地数据帧[22 x 2]
组:
命名年份
(chr)(dbl)
1比尔·克林顿1993
2比尔·克林顿1994
3比尔·克林顿1995
4比尔·克林顿1996
5比尔·克林顿1997
6比尔·克林顿1998
7比尔·克林顿1999
8比尔·克林顿2000
9比尔·克林顿2001
10乔治·W·布什2001
.. ... ...
h/t:另一种base
解决方案:
library(dplyr)
# the data
presidents <-
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
# the expansion of the table
presidents %>%
rowwise() %>%
do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
# the output
Source: local data frame [22 x 2]
Groups: <by row>
name year
(chr) (dbl)
1 Bill Clinton 1993
2 Bill Clinton 1994
3 Bill Clinton 1995
4 Bill Clinton 1996
5 Bill Clinton 1997
6 Bill Clinton 1998
7 Bill Clinton 1999
8 Bill Clinton 2000
9 Bill Clinton 2001
10 George W. Bush 2001
.. ... ...
l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# ...snip
# 8 Bill Clinton 2000
# 9 Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19 Barack Obama 2009
# 20 Barack Obama 2010
# 21 Barack Obama 2011
# 22 Barack Obama 2012
l另一种base
解决方案:
library(dplyr)
# the data
presidents <-
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
# the expansion of the table
presidents %>%
rowwise() %>%
do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
# the output
Source: local data frame [22 x 2]
Groups: <by row>
name year
(chr) (dbl)
1 Bill Clinton 1993
2 Bill Clinton 1994
3 Bill Clinton 1995
4 Bill Clinton 1996
5 Bill Clinton 1997
6 Bill Clinton 1998
7 Bill Clinton 1999
8 Bill Clinton 2000
9 Bill Clinton 2001
10 George W. Bush 2001
.. ... ...
l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# ...snip
# 8 Bill Clinton 2000
# 9 Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19 Barack Obama 2009
# 20 Barack Obama 2010
# 21 Barack Obama 2011
# 22 Barack Obama 2012
l使用tidyverse
的另一个选项可以是收集
数据为长格式,按
名称
分组,并在从
到
日期之间创建一个序列
library(tidyverse)
presidents %>%
gather(key, date, -name) %>%
group_by(name) %>%
complete(date = seq(date[1], date[2]))%>%
select(-key)
# A tibble: 22 x 2
# Groups: name [3]
# name date
# <chr> <dbl>
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# 7 Bill Clinton 1995
# 8 Bill Clinton 1996
# 9 Bill Clinton 1997
#10 Bill Clinton 1998
# … with 12 more rows
库(tidyverse)
总裁%>%
聚集(键,日期,-名称)%>%
分组单位(名称)%>%
完成(日期=序号(日期[1],日期[2]))%>%
选择(-键)
#一个tibble:22x2
#分组:名称[3]
#姓名日期
#
#1巴拉克·奥巴马2009
#2巴拉克·奥巴马2010
#3巴拉克·奥巴马2011
#4巴拉克·奥巴马2012
#5比尔·克林顿1993
#6比尔·克林顿1994
#7比尔·克林顿1995
#8比尔·克林顿1996
#9比尔·克林顿1997
#10比尔·克林顿1998
#…还有12行
使用tidyverse
的另一个选项是将数据收集成长格式,按
名称
分组,并在从
到
日期之间创建一个序列
library(tidyverse)
presidents %>%
gather(key, date, -name) %>%
group_by(name) %>%
complete(date = seq(date[1], date[2]))%>%
select(-key)
# A tibble: 22 x 2
# Groups: name [3]
# name date
# <chr> <dbl>
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# 7 Bill Clinton 1995
# 8 Bill Clinton 1996
# 9 Bill Clinton 1997
#10 Bill Clinton 1998
# … with 12 more rows
库(tidyverse)
总裁%>%
聚集(键,日期,-名称)%>%
分组单位(名称)%>%
完成(日期=序号(日期[1],日期[2]))%>%
选择(-键)
#一个tibble:22x2
#分组:名称[3]
#姓名日期
#
#1巴拉克·奥巴马2009
#2巴拉克·奥巴马2010
#3巴拉克·奥巴马2011
#4巴拉克·奥巴马2012
#5比尔·克林顿1993
#6比尔·克林顿1994
#7比尔·克林顿1995
#8比尔·克林顿1996
#9比尔·克林顿1997
#10比尔·克林顿1998
#…还有12行
使用unest
和map2
的另一种tidyverse
方法
library(tidyverse)
presidents %>%
unnest(year = map2(from, to, seq)) %>%
select(-from, -to)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
...
# 21 Barack Obama 2011
# 22 Barack Obama 2012
编辑:从tidyr v1.0.0
中,新变量不能再作为unnest()的一部分创建
使用unest
和map2
的另一种tidyverse
方法
library(tidyverse)
presidents %>%
unnest(year = map2(from, to, seq)) %>%
select(-from, -to)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
...
# 21 Barack Obama 2011
# 22 Barack Obama 2012
编辑:从tidyr v1.0.0
中,新变量不能再作为unnest()的一部分创建
使用by
创建data.frames的by
列表L
,每位总统一个data.frames,然后rbind
将它们放在一起。没有使用任何软件包
L <- by(presidents, presidents$name, with, data.frame(name, year = from:to))
do.call("rbind", setNames(L, NULL))
使用by
创建data.frames的by
列表L
,每位总统一个data.frames,然后rbind
将它们放在一起。没有使用任何软件包
L <- by(presidents, presidents$name, with, data.frame(name, year = from:to))
do.call("rbind", setNames(L, NULL))
另一种使用dplyr
和tidyr
的解决方案:
library(magrittr) # for pipes
df <- data.frame(tata = c('toto1', 'toto2'), from = c(2000, 2004), to = c(2001, 2009))
# tata from to
# 1 toto1 2000 2001
# 2 toto2 2004 2009
df %>%
dplyr::as.tbl() %>%
dplyr::rowwise() %>%
dplyr::mutate(combined = list(seq(from, to))) %>%
dplyr::select(-from, -to) %>%
tidyr::unnest(combined)
# tata combined
# <fct> <int>
# 1 toto1 2000
# 2 toto1 2001
# 3 toto2 2004
# 4 toto2 2005
# 5 toto2 2006
# 6 toto2 2007
# 7 toto2 2008
# 8 toto2 2009
library(magrittr)#用于管道
df%
dplyr::as.tbl()%>%
dplyr::行方式()%>%
dplyr::mutate(组合=列表(seq(from,to)))%>%
dplyr::选择(-from,-to)%>%
tidyr::unnest(合并)
#塔塔合并
#
#2000年1月1日
#2001年2月1日
#2004年3月2日
#2005年4月2日
#2006年5月2日
#2007年6月2日
#2008年7月2日
#2009年8月2日
另一种使用dplyr
和tidyr
的解决方案:
library(magrittr) # for pipes
df <- data.frame(tata = c('toto1', 'toto2'), from = c(2000, 2004), to = c(2001, 2009))
# tata from to
# 1 toto1 2000 2001
# 2 toto2 2004 2009
df %>%
dplyr::as.tbl() %>%
dplyr::rowwise() %>%
dplyr::mutate(combined = list(seq(from, to))) %>%
dplyr::select(-from, -to) %>%
tidyr::unnest(combined)
# tata combined
# <fct> <int>
# 1 toto1 2000
# 2 toto1 2001
# 3 toto2 2004
# 4 toto2 2005
# 5 toto2 2006
# 6 toto2 2007
# 7 toto2 2008
# 8 toto2 2009
库(magrittr)