Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R中的复杂数据整形_R_Reshape2_Tidyr - Fatal编程技术网

R中的复杂数据整形

R中的复杂数据整形,r,reshape2,tidyr,R,Reshape2,Tidyr,我有一个包含3列的数据框(摘录如下): df试试看 更新 对于编辑,您可以在聚集 gather(df, Var, Val, A:B) %>% unite(YearVar, Var, Year) %>% spread(YearVar, Val) # id A_2007 A_2008 A_2009 B_2007 B_2008 B_2009 #1 1 5 2 3 10 0 50 #2 2

我有一个包含3列的数据框(摘录如下):

df试试看

更新 对于编辑,您可以在
聚集

gather(df, Var, Val, A:B) %>% 
         unite(YearVar, Var, Year) %>% 
         spread(YearVar, Val)
#   id A_2007 A_2008 A_2009 B_2007 B_2008 B_2009
#1  1      5      2      3     10      0     50
#2  2      7      5      6     13     17     17

下面是一个使用
数据的可能解决方案。表
v>=1.9.5

library(data.table)
dcast(setDT(df), . ~ Year, value.var = c("A", "B"))
#    . 2007_A 2008_A 2009_A 2007_B 2008_B 2009_B
# 1: .      5      2      3     10      0     50

编辑:根据新数据集,只需将
id
添加到公式中即可

dcast(setDT(df), id ~ Year, value.var = c("A", "B"))
#    id 2007_A 2008_A 2009_A 2007_B 2008_B 2009_B
# 1:  1      5      2      3     10      0     50
# 2:  2      7      5      6     13     17     17

base R
中的另一个简单选项:

df_needed <- matrix(as.vector(t(df[, -1])), ncol=nrow(df)*(ncol(df)-1))
colnames(df_needed) <- paste(rep(colnames(df)[-1], nrow(df)), rep(df[, 1], e=ncol(df)-1), sep="_")

df_needed
#      A_2007 B_2007 A_2008 B_2008 A_2009 B_2009
#[1,]      5     10      2      0      3     50

df_neededGood ol'
base::重塑
在这里效果很好。只需首先创建一个虚拟id变量

df$id <- 1
reshape(df, v.names = c("A", "B"), timevar = "Year", idvar = "id", direction = "wide")
#   id A.2007 B.2007 A.2008 B.2008 A.2009 B.2009
# 1  1      5     10      2      0      3     50
这也适用于已编辑的数据(碰巧已经有了“id”变量)

您也可以使用
重塑2::重铸

recast(df, id ~ variable + Year, id.var = 1:2)

有没有办法将此解决方案用于最终数据集中超过1行的数据?请参见编辑我当前有此错误消息:
错误:行(1,4)、(2,5)、(3,6)、(7,10)、(8,11)、(9,12)的标识符重复。
@psql您需要基于新数据集创建一个indx变量。您希望如何使编辑中的组发生变化?@psql Try
df%>%mutate(indx=cumsum(c(TRUE,diff(df$Year)%gather(Var,Val,A:B)%%>%unite(YearVar,Var,Year)%%>%spread(YearVar,Val)%%>%select(-indx)
@psql好的,我根据您的第三次编辑进行了编辑。请检查works@psql,请参阅我的编辑。与其他解决方案相比,此解决方案不仅具有最短的代码,而且将是迄今为止最有效的解决方案。
df_needed <- matrix(as.vector(t(df[, -1])), ncol=nrow(df)*(ncol(df)-1))
colnames(df_needed) <- paste(rep(colnames(df)[-1], nrow(df)), rep(df[, 1], e=ncol(df)-1), sep="_")

df_needed
#      A_2007 B_2007 A_2008 B_2008 A_2009 B_2009
#[1,]      5     10      2      0      3     50
df_split <- split(df, df$Year)
df_split <- lapply(df_split, function(df){colnames(df)[-1] <- paste(colnames(df)[-1], unique(df$Year), sep="_"); df <- df[, -1]; return(df)})
df_needed <- do.call("cbind", df_split)
colnames(df_needed) <- sub("^\\d{4}\\.","",colnames(df_needed))
df_needed
#  A_2007 B_2007 A_2008 B_2008 A_2009 B_2009
#1      5     10      2      0      3     50
#4      7     13      5     17      6     17
df$id <- 1
reshape(df, v.names = c("A", "B"), timevar = "Year", idvar = "id", direction = "wide")
#   id A.2007 B.2007 A.2008 B.2008 A.2009 B.2009
# 1  1      5     10      2      0      3     50
reshape(df, timevar = "Year", idvar = "id", direction = "wide")
#    id A_2007 B_2007 A_2008 B_2008 A_2009 B_2009
#  1  1      5     10      2      0      3     50
#  2  2      7     13      5     17      6     17
recast(df, id ~ variable + Year, id.var = 1:2)