R 基于多个数据子集条件的行值创建新列_R_Dataframe

R 基于多个数据子集条件的行值创建新列

r dataframe

R 基于多个数据子集条件的行值创建新列,r,dataframe,R,Dataframe,我有一个大致如下的数据框架（最初的一个有12年的数据）：其中，年龄列中的数字表示特定年份内特定季度每个年龄组中的个人数量。值得注意的是，有时并非某一特定年份的所有季度都有数据（例如，2007年没有第三季度的数据）。此外，每行表示一个采样事件。虽然在本例中未显示，但在原始数据集中，我在特定年份内的特定季度始终有多个采样事件。例如，对于2005年第一季度，我有47个采样事件，因此产生了47行我现在想要的是一个以如下方式构造的数据帧： Year Quarter Age_1

我有一个大致如下的数据框架（最初的一个有12年的数据）：

其中，年龄列中的数字表示特定年份内特定季度每个年龄组中的个人数量。值得注意的是，有时并非某一特定年份的所有季度都有数据（例如，2007年没有第三季度的数据）。此外，每行表示一个采样事件。虽然在本例中未显示，但在原始数据集中，我在特定年份内的特定季度始终有多个采样事件。例如，对于2005年第一季度，我有47个采样事件，因此产生了47行

我现在想要的是一个以如下方式构造的数据帧：

       Year   Quarter   Age_1   Age_2   Age_3   Age_4    Cohort
       2005      1       158     120     665     32        158
       2005      2       257     145     121     14        257
       2005      3       68       69     336     65         68
       2005      4       112     458     370     101       112
       2006      1       75      457     741     26        457 
       2006      2       365     134     223     45        134
       2006      3       257     121     654     341       121
       2006      4       175     124     454     12        124
       2007      1       697     554     217     47         47
       2007      2       954     987     118     54         54
       2007      4       498     235     112     65         65

在本例中，我想在原始数据集中创建一个新列（队列），它基本上沿着我的数据集跟随我的队列。换句话说，当我在数据的第一年（2005年，所有季度）时，我取Age_1的行值并将其粘贴到新列中。当我移动到下一年（2006年）时，我会获取与我的年龄_2相关的所有行值，并将其粘贴到新列中，依此类推

我曾尝试使用以下功能，但不知何故，它只在前几年起作用：

extract_cohort_quarter <- function(d, yearclass=2005, quarterclass=1) {

 ny <- 1:nlevels(d$Year) #no. of Year levels in the dataset 
 nq <- 1:nlevels(d$Quarter)
 age0 <- (paste("age", ny, sep="_"))
 year0 <- as.character(yearclass + ny - 1)

quarter <- as.character(rep(1:4, length(age0)))
age <- rep(age0,each=4)
year <- rep(year0,each=4)

df <- data.frame(year,age,quarter,stringsAsFactors=FALSE)

n <- nrow(df)
dnew <- NULL
for(i in 1:n) {
    tmp <- subset(d, Year==df$year[i] & Quarter==df$quarter[i])
    tmp$Cohort <- tmp[[age[i]]]
    dnew <- rbind(dnew, tmp)
}
levels(dnew$Year) <- paste("Yearclass_", yearclass, ":", 
year,":",quarter,":", age, sep="")
dnew
}

extract\u court\u quarter这里有一个使用tidyverse的选项
library(dplyr)
library(tidyr)
df1 %>%
    gather(key, Cohort, -Year, -Quarter) %>%
    separate(key, into = c('key1', 'key2')) %>%
    mutate(ind = match(Year, unique(Year))) %>%
    group_by(Year) %>%
    filter(key2 == Quarter[ind]) %>% 
    mutate(newcol = paste(Year, Quarter, paste(key1, ind, sep="_"), sep=":")) %>%
    ungroup %>% 
    select(Cohort, newcol) %>%
    bind_cols(df1, .)
#   Year Quarter Age_1 Age_2 Age_3 Age_4 Cohort       newcol
#1  2005       1   158   120   665    32    158 2005:1:Age_1
#2  2005       2   257   145   121    14    257 2005:2:Age_1
#3  2005       3    68    69   336    65     68 2005:3:Age_1
#4  2005       4   112   458   370   101    112 2005:4:Age_1
#5  2006       1    75   457   741    26    457 2006:1:Age_2
#6  2006       2   365   134   223    45    134 2006:2:Age_2
#7  2006       3   257   121   654   341    121 2006:3:Age_2
#8  2006       4   175   124   454    12    124 2006:4:Age_2
#9  2007       1   697   554   217    47     47 2007:1:Age_3
#10 2007       2   954   987   118    54     54 2007:2:Age_3
#11 2007       4   498   235   112    65     65 2007:4:Age_3

我有一个简单的解决方案，但这需要对data.table库有一点了解。我想你可以很容易地适应你的进一步需要。
以下是数据：
DT <- as.data.table(list(Year   = c(2005,   2005,   2005,   2005,   2006,   2006    ,2006   ,2006,  2007,   2007,   2007),
                         Quarter= c(1,  2,  3,  4   ,1  ,2  ,3  ,4  ,1  ,2  ,4),
                         Age_1  = c(158,    257,    68, 112 ,75,    365,    257,    175,    697 ,954,   498),
                         Age_2= c(120   ,145    ,69 ,458    ,457,   134 ,121    ,124    ,554    ,987,   235),
                         Age_3= c(665   ,121    ,336    ,370    ,741    ,223    ,654    ,454,217,118,112),
                         Age_4= c(32,14,65,101,26,45,341,12,47,54,65)

))

以及输出：
> DT
    Year Quarter Age_1 Age_2 Age_3 Age_4 index cohort
 1: 2005       1   158   120   665    32     1    158
 2: 2005       2   257   145   121    14     1    257
 3: 2005       3    68    69   336    65     1     68
 4: 2005       4   112   458   370   101     1    112
 5: 2006       1    75   457   741    26     2    457
 6: 2006       2   365   134   223    45     2    134
 7: 2006       3   257   121   654   341     2    121
 8: 2006       4   175   124   454    12     2    124
 9: 2007       1   697   554   217    47     3    217
10: 2007       2   954   987   118    54     3    118
11: 2007       4   498   235   112    65     3    112

它的作用是：
DT[,index := .GRP, by = Year]

为表中的所有不同年份创建索引（by=year对年份组进行操作，.GRP按照分组顺序创建索引）。
我使用它来调用您使用创建的编号命名为Age_uu的列
DT[,cohort := get(paste0("Age_",index)),by = Year]

你甚至可以在一条线上做任何事情
DT[,cohort := get(paste0("Age_",.GRP)),by = Year]

我希望这能有所帮助。
为什么2007年第四季度的1
，而不是217
，COUNT
的COUNT的值是多少？@hermestrismegistus我承认，我犯了一个打字错误。我粘贴了年龄4而不是年龄3的值-因此，你是对的。它应该是217而不是47。非常感谢。你的建议非常有效！您还知道我如何在这个数据集中创建一个新年列，它基本上会返回一个类似于：Yearclass_2005:2007:Q1:age_3的信息。在我以前的函数中，我可以通过运行以下代码行获得这个信息：levels（dnew$Year）DT[，newcol:=paste0（Year），：，Quarter，“age_”，“.GRP”，by=Year]谢谢…它非常简单！还有一个问题：是否有一个简单的解决方案可以返回一个新列，如下代码行（在我的函数中公开）：levels（dnew$Year）@Marie christenerufener添加了新列
DT[,cohort := get(paste0("Age_",index)),by = Year]

DT[,cohort := get(paste0("Age_",.GRP)),by = Year]