R ggplot2箱线图中多个异常值时的抖动

R ggplot2箱线图中多个异常值时的抖动,r,ggplot2,boxplot,R,Ggplot2,Boxplot,我试图找到一个合适的展示,以说明学校班级内和班级之间的各种属性。每个班级只有15-30个数据点(学生) 现在我倾向于一个没有胡须的方框图,只显示1,2。三,。四分位数+大于样本中位数的数据点,例如1个总体SD+/-样本中位数 这是我能做到的 然而,我需要向一些老师展示这个图表,以衡量他们最喜欢什么。我想将我的图表与普通箱线图进行比较。但是,如果只有一个异常值,或者在相同的值上有5个异常值,则正常箱线图看起来是相同的。在这种情况下,这将是一个交易破坏者 e、 g test这能满足您的需求吗?抖动开

我试图找到一个合适的展示,以说明学校班级内和班级之间的各种属性。每个班级只有15-30个数据点(学生)

现在我倾向于一个没有胡须的方框图,只显示1,2。三,。四分位数+大于样本中位数的数据点,例如1个总体SD+/-样本中位数

这是我能做到的

然而,我需要向一些老师展示这个图表,以衡量他们最喜欢什么。我想将我的图表与普通箱线图进行比较。但是,如果只有一个异常值,或者在相同的值上有5个异常值,则正常箱线图看起来是相同的。在这种情况下,这将是一个交易破坏者

e、 g


test这能满足您的需求吗?抖动开始的位置限制不是自动的,但它是一个开始

g = ggplot(test, aes(x = places,y = value))

g + geom_boxplot(outlier.colour = rgb(0,0,0,0)) + geom_point(data = test[test$value > 8,], position = position_jitter(width = .4))

你可以重新定义这个函数

GeomBoxplot$draw<-function (., data, ..., outlier.colour = "black", outlier.shape = 16, 
    outlier.size = 2, outlier.jitter=0) 
{
    defaults <- with(data, data.frame(x = x, xmin = xmin, xmax = xmax, 
        colour = colour, size = size, linetype = 1, group = 1, 
        alpha = 1, fill = alpha(fill, alpha), stringsAsFactors = FALSE))
    defaults2 <- defaults[c(1, 1), ]
        if (!is.null(data$outliers) && length(data$outliers[[1]] >= 
        1)) {
            pp<-position_jitter(width=outlier.jitter,height=0)
            p<-pp$adjust(data.frame(x=data$x[rep(1, length(data$outliers[[1]]))], y=data$outliers[[1]]),.scale)
        outliers_grob <- GeomPoint$draw(data.frame(x=p$x, y = p$y, colour = I(outlier.colour), 
            shape = outlier.shape, alpha = 1, size = outlier.size, 
            fill = NA), ...)
    }
    else {
        outliers_grob <- NULL
    }
    with(data, ggname(.$my_name(), grobTree(outliers_grob, GeomPath$draw(data.frame(y = c(upper, 
        ymax), defaults2), ...), GeomPath$draw(data.frame(y = c(lower, 
        ymin), defaults2), ...), GeomRect$draw(data.frame(ymax = upper, 
        ymin = lower, defaults), ...), GeomRect$draw(data.frame(ymax = middle, 
        ymin = middle, defaults), ...))))
}

ggplot(test, aes(x=places,y=value))+geom_boxplot(outlier.jitter=0.05)

鉴于数据点数量较少,您希望绘制所有点,而不仅仅是异常值。这将有助于找出箱线图中点的分布

您可以使用geom_jitter实现这一点,但请注意,box_plot已经为异常值绘制点,因此为了不显示它们两次,您需要使用
geom_box plot(outlier.shape=NA)
关闭box plot的异常值显示

库(“ggplot2”)

测试由于ggplot2已被更新,似乎已接受的答案不再有效。 在网上搜了很多遍后,我在网上发现了以下内容:-看张文斯顿的回复-

他使用ddply分别计算异常值,然后使用

geom_dotplot()
已禁用geom_箱线图()上的异常值输出:

以下是上述URL的完整代码:

# This returns a data frame with the outliers only
find_outliers <- function(y, coef = 1.5) {
   qs <- c(0, 0.25, 0.5, 0.75, 1)
   stats <- as.numeric(quantile(y, qs))
   iqr <- diff(stats[c(2, 4)])

   outliers <- y < (stats[2] - coef * iqr) | y > (stats[4] + coef * iqr)

   return(y[outliers])
}


library(MASS)  # Use the birthwt data set from MASS

# Find the outliers for each level of 'smoke'
library(plyr)
outlier_data <- ddply(birthwt, .(smoke), summarise, lwt = find_outliers(lwt))


# This draws an ordinary box plot
ggplot(birthwt, aes(x = factor(smoke), y = lwt)) + geom_boxplot()


# This draws the outliers using geom_dotplot
ggplot(birthwt, aes(x = factor(smoke), y = lwt)) +
   geom_boxplot(outlier.colour = NA) +
#also consider:
#  geom_jitter(alpha = 0.5, size = 2)+
   geom_dotplot(data = outlier_data, binaxis = "y",
                stackdir = "center", binwidth = 4)
#这将返回仅包含异常值的数据帧

查找异常值代码库不再工作。对于当前版本的ggplot2,我使用了以下类:

DrawGeomBoxplotJitterOutlier <- function(data, panel_params, coord, ...,
                                         outlier.jitter.width=NULL, 
                                         outlier.jitter.height=0,
                                         outlier.colour = NULL, 
                                         outlier.fill = NULL,
                                         outlier.shape = 19, 
                                         outlier.size = 1.5, 
                                         outlier.stroke = 0.5,
                                         outlier.alpha = NULL) {
  boxplot_grob <- ggplot2::GeomBoxplot$draw_group(data, panel_params, coord, ...)
  point_grob <- grep("geom_point.*", names(boxplot_grob$children))
  if (length(point_grob) == 0)
    return(boxplot_grob)

  ifnotnull <- function(x, y) ifelse(is.null(x), y, x)

  if (is.null(outlier.jitter.width)) {
    outlier.jitter.width <- (data$xmax - data$xmin) / 2
  }

  x <- data$x[1]
  y <- data$outliers[[1]]
  if (outlier.jitter.width > 0 & length(y) > 1) {
    x <- jitter(rep(x, length(y)), amount=outlier.jitter.width)
  }

  if (outlier.jitter.height > 0 & length(y) > 1) {
    y <- jitter(y, amount=outlier.jitter.height)
  }

  outliers <- data.frame(
    x = x, y = y,
    colour = ifnotnull(outlier.colour, data$colour[1]),
    fill = ifnotnull(outlier.fill, data$fill[1]),
    shape = ifnotnull(outlier.shape, data$shape[1]),
    size = ifnotnull(outlier.size, data$size[1]),
    stroke = ifnotnull(outlier.stroke, data$stroke[1]),
    fill = NA,
    alpha = ifnotnull(outlier.alpha, data$alpha[1]),
    stringsAsFactors = FALSE
  )
  boxplot_grob$children[[point_grob]] <- ggplot2::GeomPoint$draw_panel(outliers, panel_params, coord)



  return(boxplot_grob)
}

GeomBoxplotJitterOutlier <- ggplot2::ggproto("GeomBoxplotJitterOutlier", 
                                             ggplot2::GeomBoxplot, 
                                             draw_group = DrawGeomBoxplotJitterOutlier)

geom_boxplot_jitter_outlier <- function(mapping = NULL, data = NULL, 
                                        stat = "boxplot", position = "dodge",
                                        ..., outlier.jitter.width=0, 
                                        outlier.jitter.height=NULL,
                                        na.rm = FALSE, show.legend = NA, 
                                        inherit.aes = TRUE) {
  ggplot2::layer(
    geom = GeomBoxplotJitterOutlier, mapping = mapping, data = data,
    stat = stat, position = position, show.legend = show.legend,
    inherit.aes = inherit.aes, params = list(na.rm = na.rm,
      outlier.jitter.width=outlier.jitter.width,
      outlier.jitter.height=outlier.jitter.height, ...))
}

DrawGeomBoxplotJitterOutlier谢谢你greg。但我认为您的解决方案将导致一个数据点对多个数据点。两个实际点+异常值的箱线图点。如果我能告诉ggplot不要绘制异常值,这将是一个解决方案(结合stat_boxplot和一些魔法,我猜),但是geom_boxplot(outlier.color=NULL)不起作用。对于我上面添加的outlier.color参数,我认为它会起作用。因为-四个零会产生很大的差异,而不仅仅是三个。有了位置抖动(w=0.1,h=0)),看起来还行,但很明显,ggplot仍然为异常点留下了空间。它不漂亮,但我肯定有用:-)这看起来不错非常感谢。我如何创建一个子类?我在Hadley的书中找不到这方面的参考——我在OOP中也不太擅长:)
build\u访问器
现在必须被
new
替换(从2011年6月开始),并且
GeomBoxplot
需要使用
ggplot2::
来访问,但除此之外仍然可以正常工作。谢谢!事实上,没有——似乎你需要以类似于中的方式定义
geom\u boxplot\u jitter\u outlier
。我认为这是最好的方法。特别是如果你看看其他答案中的代码数量。这也是一个在可视化数据之前处理数据的好例子,而不是将这两个步骤结合起来。
library("ggplot2")

test <-structure(list(value = c(3, 5, 3, 3, 6, 4, 5, 4, 6, 4, 6, 4, 4, 6, 5, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 5, 6, 6, 4, 3, 5\
, 4, 6, 5, 6, 4, 5, 5, 3, 4, 4, 6, 4, 4, 5, 5, 3, 4, 5, 8, 8, 8, 8, 9, 6, 6, 7, 6, 9), places = structure(c(1L, 2L, 1L, 1L, 1L\
, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, \
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("a", "b"), class =\
 "factor")), .Names = c("value", "places"), row.names = c(NA, -60L), class = "data.frame")

# adding a level that you will use latter for giving colors
l <- rep(c(10,20,30,40,50,60), 10)
test$levels<-l

# [1]
# original plot
ggplot(test, aes(x=places,y=value))+geom_boxplot()

# [2]
# plot with outlier from boxplot and the points jittered to see
# distribution (outliers and the same point from position jitter would be
# counted twice for each different height)
dev.new()
ggplot(data=test, aes(x=places, y=value)) + geom_boxplot() +  geom_jitter(position=position_jitter(width=0.1, height=0))

# [3]
# make wider the jitter to avoid overplotting because there are a lot
# of points with the same value, also remove the outliers from boxplot
# (they are plotted with the geom_jitter anyway)
dev.new()
ggplot(data=test, aes(x=places, y=value)) + geom_boxplot(outlier.shape = NA) +
  geom_jitter(position=position_jitter(width=0.3, height=0))

# [4]
# adding colors to the points to see if there is a sub-pattern in the distribution
dev.new()
ggplot(data=test, aes(x=places, y=value)) + geom_boxplot(outlier.shape = NA) +
  geom_jitter(position=position_jitter(width=0.3, height=0), aes(colour=levels))

# [5]
# adding a bit of vertical jittering
# jittering (a good option for a less discrete datasets)
dev.new()
ggplot(data=test, aes(x=places, y=value)) + geom_boxplot(outlier.shape = NA) +
  geom_jitter(position=position_jitter(width=0.3, height=0.05), aes(colour=levels))

# [6]
# finally remember that position_jitter makes a jittering of a 40% of
# the resolution of the data, so if you forget the height=0 you will
# have a total different picture
dev.new()
ggplot(data=test, aes(x=places, y=value)) + geom_boxplot(outlier.shape = NA) +
  geom_jitter(position=position_jitter(width=0.2))
geom_dotplot()
 geom_boxplot(outlier.colour = NA) 
# This returns a data frame with the outliers only
find_outliers <- function(y, coef = 1.5) {
   qs <- c(0, 0.25, 0.5, 0.75, 1)
   stats <- as.numeric(quantile(y, qs))
   iqr <- diff(stats[c(2, 4)])

   outliers <- y < (stats[2] - coef * iqr) | y > (stats[4] + coef * iqr)

   return(y[outliers])
}


library(MASS)  # Use the birthwt data set from MASS

# Find the outliers for each level of 'smoke'
library(plyr)
outlier_data <- ddply(birthwt, .(smoke), summarise, lwt = find_outliers(lwt))


# This draws an ordinary box plot
ggplot(birthwt, aes(x = factor(smoke), y = lwt)) + geom_boxplot()


# This draws the outliers using geom_dotplot
ggplot(birthwt, aes(x = factor(smoke), y = lwt)) +
   geom_boxplot(outlier.colour = NA) +
#also consider:
#  geom_jitter(alpha = 0.5, size = 2)+
   geom_dotplot(data = outlier_data, binaxis = "y",
                stackdir = "center", binwidth = 4)
DrawGeomBoxplotJitterOutlier <- function(data, panel_params, coord, ...,
                                         outlier.jitter.width=NULL, 
                                         outlier.jitter.height=0,
                                         outlier.colour = NULL, 
                                         outlier.fill = NULL,
                                         outlier.shape = 19, 
                                         outlier.size = 1.5, 
                                         outlier.stroke = 0.5,
                                         outlier.alpha = NULL) {
  boxplot_grob <- ggplot2::GeomBoxplot$draw_group(data, panel_params, coord, ...)
  point_grob <- grep("geom_point.*", names(boxplot_grob$children))
  if (length(point_grob) == 0)
    return(boxplot_grob)

  ifnotnull <- function(x, y) ifelse(is.null(x), y, x)

  if (is.null(outlier.jitter.width)) {
    outlier.jitter.width <- (data$xmax - data$xmin) / 2
  }

  x <- data$x[1]
  y <- data$outliers[[1]]
  if (outlier.jitter.width > 0 & length(y) > 1) {
    x <- jitter(rep(x, length(y)), amount=outlier.jitter.width)
  }

  if (outlier.jitter.height > 0 & length(y) > 1) {
    y <- jitter(y, amount=outlier.jitter.height)
  }

  outliers <- data.frame(
    x = x, y = y,
    colour = ifnotnull(outlier.colour, data$colour[1]),
    fill = ifnotnull(outlier.fill, data$fill[1]),
    shape = ifnotnull(outlier.shape, data$shape[1]),
    size = ifnotnull(outlier.size, data$size[1]),
    stroke = ifnotnull(outlier.stroke, data$stroke[1]),
    fill = NA,
    alpha = ifnotnull(outlier.alpha, data$alpha[1]),
    stringsAsFactors = FALSE
  )
  boxplot_grob$children[[point_grob]] <- ggplot2::GeomPoint$draw_panel(outliers, panel_params, coord)



  return(boxplot_grob)
}

GeomBoxplotJitterOutlier <- ggplot2::ggproto("GeomBoxplotJitterOutlier", 
                                             ggplot2::GeomBoxplot, 
                                             draw_group = DrawGeomBoxplotJitterOutlier)

geom_boxplot_jitter_outlier <- function(mapping = NULL, data = NULL, 
                                        stat = "boxplot", position = "dodge",
                                        ..., outlier.jitter.width=0, 
                                        outlier.jitter.height=NULL,
                                        na.rm = FALSE, show.legend = NA, 
                                        inherit.aes = TRUE) {
  ggplot2::layer(
    geom = GeomBoxplotJitterOutlier, mapping = mapping, data = data,
    stat = stat, position = position, show.legend = show.legend,
    inherit.aes = inherit.aes, params = list(na.rm = na.rm,
      outlier.jitter.width=outlier.jitter.width,
      outlier.jitter.height=outlier.jitter.height, ...))
}