R 复杂数据帧和数据转置
我有一个数据框,如下所示:R 复杂数据帧和数据转置,r,R,我有一个数据框,如下所示: ID Capital Instal Date1 Date2 2 500 25 a b 2 500 20 a c 2 450 15 a a 2 300 10 a f 2 250 0 a z 4 100 25 b a 4 90 20 b
ID Capital Instal Date1 Date2
2 500 25 a b
2 500 20 a c
2 450 15 a a
2 300 10 a f
2 250 0 a z
4 100 25 b a
4 90 20 b b
4 80 15 b a
4 75 10 b f
4 25 0 b z
我想在此基础上创建一个新的data.frame,如果Date1=Date2,那么我的新数据框B将如下所示:
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
2 a 450 15 10 0
4 b 90 20 15 10 0
< >我希望新的DATA框架只考虑DATE1和DATE2之后的数据相等。 < P>一个迂回的方式。我相信有一种更快的方法可以做到这一点,但这会让您得到您所期望的输出 步骤:当date1==date2时,选择行号并在选择后填写。筛选这些记录并仅选择所需的列。创建列作为排列中的标题,并排列Instal数据。接下来连接数据子集以获得正确的资本值,并将此表与上一个选择连接起来
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
mutate(rowid = row_number(),
selection = ifelse(Date1 == Date2, rowid, NA)) %>%
fill(selection) %>% # fill rowid over the rows after date1 == date2
filter(!is.na(selection)) %>%
select(ID, Date1, Instal) %>%
mutate(Installation = paste0("Instal", row_number())) %>%
spread(Installation, Instal) %>%
inner_join(df %>% filter(Date1 == Date2) %>% select(ID, Date1, Capital), .)
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
1 2 a 450 15 10 0 NA
2 4 b 90 20 15 10 0
数据:
这里是另一个tidyverse解决方案
library(dplyr)
library(tidyr)
df2 <- df %>%
group_by(ID) %>% #group by ID
mutate(ind=cumsum(Date1==Date2)) %>% #mark elements after first Date1==Date2
filter(ind!=0) %>% #remove previous elements
summarise(Date1=first(Date1),
Capital=first(Capital),
Instal=list(Instal)) %>% #capture values for table
unnest() %>% #spread Instal, one value per row
group_by(ID) %>%
mutate(Inst=paste0("Instal",row_number())) %>% #mark names of Instal values
spread(key=Inst,value=Instal) #spread into wide format
df2
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
1 2 a 450 15 10 0 NA
2 4 b 90 20 15 10 0
三角帆
以下是一种tidyverse方法dplyr+tidyr:
数据:
Date1==Date2上的子集,然后从中重塑。非常感谢,新的数据框即将出现,但列不符合顺序。我该怎么纠正呢?你可以简单地使用transmute而不是Summaryunnest@N.Fungura它们的顺序对我来说是正确的。ID始终是第一个,后四个应按数字顺序排列。您可以更改摘要报表中日期1和大写字母的顺序。
library(dplyr)
library(tidyr)
df2 <- df %>%
group_by(ID) %>% #group by ID
mutate(ind=cumsum(Date1==Date2)) %>% #mark elements after first Date1==Date2
filter(ind!=0) %>% #remove previous elements
summarise(Date1=first(Date1),
Capital=first(Capital),
Instal=list(Instal)) %>% #capture values for table
unnest() %>% #spread Instal, one value per row
group_by(ID) %>%
mutate(Inst=paste0("Instal",row_number())) %>% #mark names of Instal values
spread(key=Inst,value=Instal) #spread into wide format
df2
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
1 2 a 450 15 10 0 NA
2 4 b 90 20 15 10 0
library(tidyverse)
df2 <- df %>%
group_by(ID) %>%
filter(cumsum(Date1 == Date2) >0) %>%
transmute(Capital=Capital[1],Instal,Date1,colnames = paste0("Instal",seq(n()))) %>%
ungroup %>%
spread(colnames,Instal)
df2[is.na(df2)] <- 0 # omit if you'd rather have NA
# # A tibble: 2 x 7
# ID Capital Date1 Instal1 Instal2 Instal3 Instal4
# * <int> <int> <chr> <int> <int> <int> <int>
# 1 2 450 a 15 10 0 0
# 2 4 90 b 20 15 10 0
df_list <-
lapply(split(df,df$ID),function(x) {
x <- subset(x,cumsum(Date1==Date2)>0)
x <- transform(x, Capital=Capital[1], time = seq(nrow(x)))
reshape(x,idvar=c("ID","Capital","Date1"),direction="wide",sep="",drop="Date2")
})
all_names <- names(df_list[[which.max(lengths(df_list))]])
df_list_full <- lapply(df_list,function(x) {x[setdiff(all_names,names(x))] <- NA;x})
do.call(rbind, df_list_full)
# ID Capital Date1 Instal1 Instal2 Instal3 Instal4
# 2 2 450 a 15 10 0 NA
# 4 4 90 b 20 15 10 0
df <- read.table(text = "ID Capital Instal Date1 Date2
2 500 25 a b
2 500 20 a c
2 450 15 a a
2 300 10 a f
2 250 0 a z
4 100 25 b a
4 90 20 b b
4 80 15 b a
4 75 10 b f
4 25 0 b z",h=T,strin=F)