Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在三维散点图中标记异常数据标签_R_Python 2.7_Numpy_Matplotlib_Scatter Plot - Fatal编程技术网

R 在三维散点图中标记异常数据标签

R 在三维散点图中标记异常数据标签,r,python-2.7,numpy,matplotlib,scatter-plot,R,Python 2.7,Numpy,Matplotlib,Scatter Plot,我有一个选项卡分隔的数据集,如下所示 Labels t1 t2 t3 gene1 0.000000E+00 0.000000E+00 1.138501E-01 gene2 0.000000E+00 0.000000E+00 9.550272E-02 gene3 0.000000E+00 1.851936E-02 1.019907E-01 gene4 8.212816E-02 0.000000E+00 6.570984E+00

我有一个选项卡分隔的数据集,如下所示

Labels  t1  t2  t3
gene1   0.000000E+00    0.000000E+00    1.138501E-01
gene2   0.000000E+00    0.000000E+00    9.550272E-02
gene3   0.000000E+00    1.851936E-02    1.019907E-01
gene4   8.212816E-02    0.000000E+00    6.570984E+00
gene5   1.282434E-01    0.000000E+00    6.240799E+00
gene6   2.918929E-01    8.453281E-01    3.387610E+00
gene7   0.000000E+00    1.923038E-01    0.000000E+00
gene8   1.135057E+00    0.000000E+00    2.491100E+00
gene9   7.935625E-01    1.070320E-01    2.439292E+00
gene10  5.046790E+00    0.000000E+00    2.459273E+00
gene11  3.293614E-01    0.000000E+00    2.380152E+00
gene12  0.000000E+00    0.000000E+00    1.474757E-01
gene13  0.000000E+00    0.000000E+00    1.521591E-01
gene14  0.000000E+00    9.968809E-02    8.387166E-01
gene15  0.000000E+00    1.065761E-01    0.000000E+00
我想要的是:得到一个带有异常值标签的3d散点图,如下所示:

library("scatterplot3d")
temp<-read.table("tempdata.txt", header=T)
scatterplot3d(temp1$t1, temp1$t2, temp1$t3)
# Create some random data
library("scatterplot3d")
temp1 <- as.data.frame(matrix(rnorm(900), ncol=3))
temp1$labels <- c("gen1", "gen2", "gen3")
colnames(temp1) <- c("t1", "t2", "t3", "labels")

# get the outliers
zz.outlier <- sort(temp1$t3, TRUE)[1:5]
ix <- which(temp1$t3 %in% zz.outlier)
outlier.matrix <- temp1[ix, ]

# create the plot and mark the points
sd3 <- scatterplot3d(temp1$t1, temp1$t2, temp1$t3)
sd3$points3d(temp1$t1[ix],temp1$t2[ix],temp1$t2[ix], col="red")
text(sd3$xyz.convert(temp1$t1[ix],temp1$t2[ix],temp1$t2[ix]), 
     labels=temp1$labels[ix])

我所做的:在R

实际上,我已经单独阅读了每一专栏,如下所示:

library("scatterplot3d")
temp<-read.table("tempdata.txt", header=T)
scatterplot3d(temp1$t1, temp1$t2, temp1$t3)
# Create some random data
library("scatterplot3d")
temp1 <- as.data.frame(matrix(rnorm(900), ncol=3))
temp1$labels <- c("gen1", "gen2", "gen3")
colnames(temp1) <- c("t1", "t2", "t3", "labels")

# get the outliers
zz.outlier <- sort(temp1$t3, TRUE)[1:5]
ix <- which(temp1$t3 %in% zz.outlier)
outlier.matrix <- temp1[ix, ]

# create the plot and mark the points
sd3 <- scatterplot3d(temp1$t1, temp1$t2, temp1$t3)
sd3$points3d(temp1$t1[ix],temp1$t2[ix],temp1$t2[ix], col="red")
text(sd3$xyz.convert(temp1$t1[ix],temp1$t2[ix],temp1$t2[ix]), 
     labels=temp1$labels[ix])
库(“scatterplot3d”)

temp将250个标签打印到绘图中不是一个好的选择,因为这样会使绘图无法读取。如果要在绘图中标记异常值,这些异常值应远离其他数据点,以便轻松地唯一识别它们。但是,您可以将最大的250 zz值及其相应的标签保存在矩阵中以供进一步分析。我会这样做:

library("scatterplot3d")
temp<-read.table("tempdata.txt", header=T)
scatterplot3d(temp1$t1, temp1$t2, temp1$t3)
# Create some random data
library("scatterplot3d")
temp1 <- as.data.frame(matrix(rnorm(900), ncol=3))
temp1$labels <- c("gen1", "gen2", "gen3")
colnames(temp1) <- c("t1", "t2", "t3", "labels")

# get the outliers
zz.outlier <- sort(temp1$t3, TRUE)[1:5]
ix <- which(temp1$t3 %in% zz.outlier)
outlier.matrix <- temp1[ix, ]

# create the plot and mark the points
sd3 <- scatterplot3d(temp1$t1, temp1$t2, temp1$t3)
sd3$points3d(temp1$t1[ix],temp1$t2[ix],temp1$t2[ix], col="red")
text(sd3$xyz.convert(temp1$t1[ix],temp1$t2[ix],temp1$t2[ix]), 
     labels=temp1$labels[ix])
#创建一些随机数据
库(“scatterplot3d”)

temp1在matplotlib中:

import numpy as np
from matplotlib import pyplot, cm
from mpl_toolkits.mplot3d import Axes3D

data = np.genfromtxt('genes.txt', usecols=range(1,4))
N = len(data)
nout = N/4   # top 25% in magnitude
outliers = np.argsort(np.sqrt(np.sum(data**2, 1)))[-nout:]
outlies = np.zeros(N)
outlies[outliers] = 1   # now an array of 0 or 1, depending on whether an outlier

fig = pyplot.figure()
ax = fig.add_subplot(111, projection='3d')

ax.scatter(*data.T, c=cm.jet(outlies)) # color by whether outlies.
pyplot.show()
在这里,红色远离原点,蓝色靠近:

如何对异常值进行分类?zz/xx/yy值前250名?或者在距离原点/平均值/某点的欧几里德距离中?您可以通过使用
排序(temp1$t1,TRUE)[1:250]
找到向量的最大值,标签呢?虽然我知道如何获取数据,但我需要根据上一列中的值筛选数据,例如,如果我的值是gene13的两倍,它应该根据上一列中的值对其进行排序并给出输出。我不太清楚你的意思。您想要t3列的异常值及其相应的标签吗?谢谢,您还可以添加一行以获取这些异常值的列表。@Angelo
outliers
已经是异常值列表的一部分。如果愿意,您可以在打印异常值时添加一行。您的数据集从
gene1
计数,但异常值从
0
计数。因此,您实际上希望
打印异常值+1