在R中将案例合并为一个案例_R_Reshape

在R中将案例合并为一个案例

在R中将案例合并为一个案例,r,reshape,R,Reshape,我有一个很新的问题。我使用的是援助工作者安全数据库，该数据库记录了1997年至今针对援助工作者的暴力事件。这些事件在数据集中单独标记。我想合并一个国家在给定年份发生的所有事件，将其他变量的值相加，并创建一个简单的时间序列，所有国家1997-2013年的年数相同。知道怎么做吗 df # year country totalnationals internationalskilled # 1 1997 Rwanda 0 3 #

我有一个很新的问题。我使用的是援助工作者安全数据库，该数据库记录了1997年至今针对援助工作者的暴力事件。这些事件在数据集中单独标记。我想合并一个国家在给定年份发生的所有事件，将其他变量的值相加，并创建一个简单的时间序列，所有国家1997-2013年的年数相同。知道怎么做吗

df
#   year  country totalnationals internationalskilled
# 1 1997   Rwanda              0                    3
# 2 1997 Cambodia              1                    0
# 3 1997  Somalia              0                    1
# 4 1997   Rwanda              1                    0
# 5 1997 DR Congo             10                    0
# 6 1997  Somalia              1                    0
# 7 1997   Rwanda              1                    0
# 8 1998   Angola              5                    0

其中，df定义为：

df <- structure(list(year = c(1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 
  1997L, 1998L), country = c("Rwanda", "Cambodia", "Somalia", "Rwanda", 
  "DR Congo", "Somalia", "Rwanda", "Angola"), totalnationals = c(0L, 
  1L, 0L, 1L, 10L, 1L, 1L, 5L), internationalskilled = c(3L, 0L, 
  1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("year", "country", "totalnationals", 
  "internationalskilled"), class = "data.frame", row.names = c(NA, -8L))

很抱歉问了一个非常新的问题。。。但到目前为止我还不知道怎么做。谢谢-

在OP的评论后更新-

df <- subset(df, year <= 2013 & year >= 1997)
df$totalnationals <- as.integer(df$totalnationals)
df$internationalskilled <- as.integer(df$internationalskilled)
df2 <- aggregate(data = df,cbind(totalnationals,internationalskilled)~year+country, sum)

要在没有记录的年份中添加0，请执行以下操作-

df3 <- expand.grid(unique(df$year),unique(df$country))
df3 <- merge(df3,df2, all.x = TRUE, by = 1:2)
df3[is.na(df3)] <- 0

在大型数据集上使用数据表也可以更快

library(data.table)
dt   <- data.table(df,key="year,country")
smry <- dt[,list(totalnationals      =sum(totalnationals), 
                 internationalskilled=sum(internationalskilled)),
           by="year,country"]
countries   <- unique(dt$country)
template    <- data.table(year=rep(1997:2013,each=length(countries)),
                          country=countries, 
                          key="year,country")
time.series <- smry[template]
time.series[is.na(time.series)]=0

请阅读，然后相应地编辑您的问题。谢谢您的回答，但结果并没有如我预期的那样。它要么显示“evalexpr、envir、enclose:object‘year’not found”中的错误，要么如果我包含‘df Updated’，它会给我一个空数据集。另外，我可能遗漏了一个方面，那就是如果你需要一个0，在其中没有人被杀的年份。这是必需的吗？还添加了插入零部分。零部分的代码是有效的，但我得到了相同国家和年份的重复值。合并后，我最终得到了大约150万个病例，在1997-2013年17年间，我应该有大约1190个病例，涉及70个国家。有没有办法消除重复出现的病例？谢谢对不起，我的错。expand.grid需要一个唯一的内部文件。你现在能试试吗？

library(data.table)
dt   <- data.table(df,key="year,country")
smry <- dt[,list(totalnationals      =sum(totalnationals), 
                 internationalskilled=sum(internationalskilled)),
           by="year,country"]
countries   <- unique(dt$country)
template    <- data.table(year=rep(1997:2013,each=length(countries)),
                          country=countries, 
                          key="year,country")
time.series <- smry[template]
time.series[is.na(time.series)]=0