R中的火山图:添加细节:仅为公共因子着色
我有一个问题,就是在两个数据集中(整个冒号/火山)给一些基因着色以指定常见基因。 下面的代码运行良好。然而,问题是,我想补充一些细节,这是相当棘手的 我想为普通基因应用不同的颜色(红色会更好):只有在满足以下条件的情况下:(整结肠$genes==火山$genes)。 我试图将组划分为(指定的增加/指定的减少)但遗憾的是没有成功 这是我的代码附件 非常感谢R中的火山图:添加细节:仅为公共因子着色,r,dataframe,ggplot2,data.table,R,Dataframe,Ggplot2,Data.table,我有一个问题,就是在两个数据集中(整个冒号/火山)给一些基因着色以指定常见基因。 下面的代码运行良好。然而,问题是,我想补充一些细节,这是相当棘手的 我想为普通基因应用不同的颜色(红色会更好):只有在满足以下条件的情况下:(整结肠$genes==火山$genes)。 我试图将组划分为(指定的增加/指定的减少)但遗憾的是没有成功 这是我的代码附件 非常感谢 #volcano plot using ggplot2 library(data.table) # Adding gr
#volcano plot using ggplot2
library(data.table)
# Adding group to decipher if the gene is significant or not:
whole_colon <- data.frame(whole_colon)
whole_colon["group"] <- "NotSignificant"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] > 1.5),"group"] <- "colon_Increased_specialized"
whole_colon[which(volcano['FDR'] < 0.05 & volcano['logFC'] < -1.5),"group"] <- "colon_Decreased_specialized"
with(subset(whole_colon , FDR<0.05), points(logFC, -log10(FDR), pch=20,col="red"), whole_colon$genes==volcano$genes)
library(ggplot2)
ggplot(whole_colon, aes(x = logFC, y = -log10(FDR), color = group))+
scale_colour_manual(values = cols) +
ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
geom_point(size = 2.5, alpha = 1, na.rm = T) +
theme_bw(base_size = 14) +
theme(legend.position = "right") +
xlab(expression(log[2]("logFC"))) +
ylab(expression(-log[10]("FDR"))) +
geom_hline(yintercept = 1.30102, colour="#990000", linetype="dashed") +
geom_vline(xintercept = 1.5849, colour="#990000", linetype="dashed") +
geom_vline(xintercept = -1.5849, colour="#990000", linetype="dashed")+
scale_y_continuous(trans = "log1p")
火山:
genes logFC FDR group
1 INHBA 6.271879 2.070000e-30 Increased
2 COL10A1 7.634386 1.820000e-23 Increased
3 WNT2 9.485133 6.470000e-20 Increased
4 COL8A1 3.974965 6.470000e-20 Increased
5 THBS2 4.104176 2.510000e-19 Increased
6 BGN 3.524484 5.930000e-18 Increased
7 COMP 11.916956 2.740000e-17 Increased
9 SULF1 3.540374 1.290000e-15 Increased
10 CTHRC1 3.937028 4.620000e-14 Increased
11 TRIM29 3.827088 1.460000e-11 Increased
12 SLC6A20 5.060538 5.820000e-11 Increased
13 SFRP4 5.924330 8.010000e-11 Increased
14 CDH3 5.330732 8.940000e-11 Increased
15 ESM1 6.491496 3.380000e-10 Increased
614 TDP2 -1.801368 0.002722461 NotSignificant
615 EPHX2 -1.721039 0.002722461 NotSignificant
616 RAVER2 -1.581812 0.002749728 NotSignificant
617 BMP6 -2.702780 0.002775460 Increased
619 SCNN1G -4.012111 0.002870500 Increased
620 SLC52A3 -1.868920 0.002931197 NotSignificant
621 VIPR1 -1.556238 0.002945578 NotSignificant
622 SUCLG2 -1.720993 0.003059717 NotSignificant
提供的示例数据集不完整,因为没有重叠,因此很难根据该数据集进行颜色编码。请尝试以下操作,关键是您不能使用
==
,而是要使用%中的%返回一个布尔值,以确定整号中的基因是否位于火山:
whole_colon=structure(list(genes = structure(c(5L, 11L, 3L,
7L, 10L, 1L,
2L, 9L, 6L, 8L, 12L, 4L, 13L, 14L), .Label = c("BEST4", "COL10A1",
"COL11A1", "COL8A1", "CST1", "GUCA2B", "INHBA", "KRT6B", "MMP11",
"MMP7", "OTOP2", "WNT2", "ABC", "DEF"), class = "factor"), logFC = c(9.554742,
-9.408177, 6.825363, 6.271879, 7.594926, -7.756451, 7.634386,
4.767644, -6.346156, 11.80155, 9.485133, 3.974965, 0.5, -0.5),
FDR = c(5.64e-45, 5.76e-32, 1e-31, 2.07e-30, 2.07e-30, 8.3e-30,
1.82e-23, 2.7e-23, 2.17e-21, 5.37e-20, 6.47e-20, 6.47e-20,
1, 1), group = c("Increased", "Decreased", "Increased", "specific_Increased",
"Increased", "Decreased", "specific_Increased", "Increased",
"Decreased", "Increased", "specific_Increased", "specific_Increased",
"NotSignificant", "NotSignificant")), row.names = c("1",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"2"), class = "data.frame")
设置组:
#set the decreased and increased like you did:
whole_colon["group"] <- "NotSignificant"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & -whole_colon['logFC'] > 1.5),"group"] <- "Decreased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] < -1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Decreased"
#
提供的示例数据集不完整,因为没有重叠,因此很难根据该数据集进行颜色编码。请尝试以下操作,关键是您不能使用==
,而是要使用%
中的%返回一个布尔值,以确定整号中的基因是否位于火山:
whole_colon=structure(list(genes = structure(c(5L, 11L, 3L,
7L, 10L, 1L,
2L, 9L, 6L, 8L, 12L, 4L, 13L, 14L), .Label = c("BEST4", "COL10A1",
"COL11A1", "COL8A1", "CST1", "GUCA2B", "INHBA", "KRT6B", "MMP11",
"MMP7", "OTOP2", "WNT2", "ABC", "DEF"), class = "factor"), logFC = c(9.554742,
-9.408177, 6.825363, 6.271879, 7.594926, -7.756451, 7.634386,
4.767644, -6.346156, 11.80155, 9.485133, 3.974965, 0.5, -0.5),
FDR = c(5.64e-45, 5.76e-32, 1e-31, 2.07e-30, 2.07e-30, 8.3e-30,
1.82e-23, 2.7e-23, 2.17e-21, 5.37e-20, 6.47e-20, 6.47e-20,
1, 1), group = c("Increased", "Decreased", "Increased", "specific_Increased",
"Increased", "Decreased", "specific_Increased", "Increased",
"Decreased", "Increased", "specific_Increased", "specific_Increased",
"NotSignificant", "NotSignificant")), row.names = c("1",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"2"), class = "data.frame")
设置组:
#set the decreased and increased like you did:
whole_colon["group"] <- "NotSignificant"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5),"group"] <- "Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & -whole_colon['logFC'] > 1.5),"group"] <- "Decreased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] > 1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Increased"
whole_colon[which(whole_colon['FDR'] < 0.05 & whole_colon['logFC'] < -1.5 & whole_colon$genes %in% volcano$genes),"group"] <- "specific_Decreased"
#
我想我解决了这个问题。只要再加上一句话,这个问题就解决了。
在调整了@StupidWolf的建议和对col的lil重新定义过程后,我得到了我想要的图像
cols<- c(red="red", orange="orange", NotSignificant="darkgrey", Increased= "#00B2FF" ,Decreased="#00B2FF", specific_Increased="#ff4d00", specific_Decreased="#ff4d00" )
head(cols)
cols我想我解决了这个问题。只要再加上一句话,这个问题就解决了。
在调整了@StupidWolf的建议和对col的lil重新定义过程后,我得到了我想要的图像
cols<- c(red="red", orange="orange", NotSignificant="darkgrey", Increased= "#00B2FF" ,Decreased="#00B2FF", specific_Increased="#ff4d00", specific_Decreased="#ff4d00" )
head(cols)
coli我怀疑这是否能帮助你解决问题,但是,他们确实有一个名为“COL8A1”的公共数据(如果你愿意,我可以更改此样本数据以包含更多的公共基因)。为了说明这一点,我只寻找一个相同的冒号[基因],而不是整个数据行。不幸的是,这个命令也没有起作用。给了我一张“整号”数据的完整图像[这是进度!!]但仍然缺少涉及通用数据集的颜色标记。您想对另一个数据集中的重要组和发现的组进行颜色编码,对吗?即5个不同组,重要上/下,不重要,重要上/下,在火山中发现,我是否正确获得了这一点?您需要为您的数据集的向量colsThanks指定5种颜色因为有帮助,是的,我很想给它们涂上不同的颜色,但不是所有这5个扇区都需要不同的颜色。上面的颜色已经足够了,而我们只需再加上一个“红色”加上“增加的比色”和“减少的比色”,我怀疑这是否能帮你解决问题,但是,它们确实有共同的数据称为“COL8A1”(如果您愿意,我可以更改此示例数据以包含更多常见的基因)。为了澄清这一点,我只希望一个冒号[基因]是相同的,而不是整个数据行。而且这个命令也没有起作用。给了我一个完整的“整冒号”数据图像[这是一个进步!!]但仍然缺少涉及通用数据集的颜色标记。您想对另一个数据集中的重要组和发现的组进行颜色编码,对吗?即5个不同组,重要上/下,不重要,重要上/下,在火山中发现,我是否正确获得了这一点?您需要为您的数据集的向量colsThanks指定5种颜色由于您的帮助,是的,我很想用不同的颜色,但并非所有这5个扇区都需要不同的颜色。上面的颜色已经足够了,而我们只需再添加一个“红色”和“特定颜色增加”和“特定颜色减少”