R 操纵一个数据框,其中每个实验有多个列
我有很多测序实验,每个实验都有几百个基因的多个结果,当数据从另一个程序输出时,它对我来说不是有用的格式,因为所有实验和每个结果都列在顶部,每个基因有一行。我已经写了一个示例数据集,以及我目前如何解决这个问题作为一个示例,但我希望有一个更优化的方法,因为我的数据集非常大R 操纵一个数据框,其中每个实验有多个列,r,dataframe,R,Dataframe,我有很多测序实验,每个实验都有几百个基因的多个结果,当数据从另一个程序输出时,它对我来说不是有用的格式,因为所有实验和每个结果都列在顶部,每个基因有一行。我已经写了一个示例数据集,以及我目前如何解决这个问题作为一个示例,但我希望有一个更优化的方法,因为我的数据集非常大 col1<- c("","", "gene1", "gene2", "gene3", "gene4") col2<- c("Experiment1", "Part 1", "a","b","c","d") col
col1<- c("","", "gene1", "gene2", "gene3", "gene4")
col2<- c("Experiment1", "Part 1", "a","b","c","d")
col3<- c("Experiment1", "Part 2", "e", "f", "g", "h")
col4<- c("Experiment2", "Part 1", "i", "j", "k", "l")
col5<- c("Experiment2", "Part 2", "m", "n", "o", "p")
pp<- data.frame(col1,col2,col3,col4,col5)
one<-data.frame(pp$col1, pp$col2)
onetwo<- data.frame(pp$col1,pp$col3)
two<-data.frame(pp$col1, pp$col4)
twotwo<-data.frame(pp$col1,pp$col5)
one$V3[3:6]<-as.character(one[2,2])
one<-one[-2,]
one<-one[-1,]
colnames(one)<- c("gene", "Experiment 1", "part")
onetwo$V3[3:6]<-as.character(onetwo[2,2])
onetwo<-onetwo[-2,]
onetwo<-onetwo[-1,]
colnames(onetwo)<- c("gene", "Experiment 1", "part")
x1<-rbind(one, onetwo)
two$V3[3:6]<-as.character(two[2,2])
two<-two[-2,]
two<-two[-1,]
colnames(two)<- c("gene", "Experiment 2", "part")
twotwo$V3[3:6]<-as.character(twotwo[2,2])
twotwo<-twotwo[-2,]
twotwo<-twotwo[-1,]
colnames(twotwo)<- c("gene", "Experiment 2", "part")
x2<-rbind(two, twotwo)
x3<-merge(x1,x2)
col1这可能是一种较短的方法:
pp.new <- as.data.frame(t(pp)[-1,], row.names = 1)
names(pp.new) <- c("experiment", "part", "gene1", "gene2", "gene3", "gene4")
但是,最好使用Reforme2包将其转换为长格式:
如果要获得可比较的输出,如x3
,可以使用recast
功能(也可从重塑2软件包获得):
其中:
> pp.new
experiment part gene1 gene2 gene3 gene4
1 Experiment1 Part 1 a b c d
2 Experiment1 Part 2 e f g h
3 Experiment2 Part 1 i j k l
4 Experiment2 Part 2 m n o p
part variable Experiment1 Experiment2
1 Part 1 gene1 a i
2 Part 1 gene2 b j
3 Part 1 gene3 c k
4 Part 1 gene4 d l
5 Part 2 gene1 e m
6 Part 2 gene2 f n
7 Part 2 gene3 g o
8 Part 2 gene4 h p
> pp.long
experiment part variable value
1 Experiment1 Part 1 gene1 a
2 Experiment1 Part 2 gene1 e
3 Experiment2 Part 1 gene1 i
4 Experiment2 Part 2 gene1 m
5 Experiment1 Part 1 gene2 b
6 Experiment1 Part 2 gene2 f
7 Experiment2 Part 1 gene2 j
8 Experiment2 Part 2 gene2 n
9 Experiment1 Part 1 gene3 c
10 Experiment1 Part 2 gene3 g
11 Experiment2 Part 1 gene3 k
12 Experiment2 Part 2 gene3 o
13 Experiment1 Part 1 gene4 d
14 Experiment1 Part 2 gene4 h
15 Experiment2 Part 1 gene4 l
16 Experiment2 Part 2 gene4 p
recast(pp.new, part + variable ~ experiment, id.var=c("experiment","part"), value.var = "value")
part variable Experiment1 Experiment2
1 Part 1 gene1 a i
2 Part 1 gene2 b j
3 Part 1 gene3 c k
4 Part 1 gene4 d l
5 Part 2 gene1 e m
6 Part 2 gene2 f n
7 Part 2 gene3 g o
8 Part 2 gene4 h p