具有预定义计数的Geom_freqpoly

具有预定义计数的Geom_freqpoly,r,ggplot2,R,Ggplot2,我可以使用观测的数量毫无问题地绘制geom_freqpoly ggplot(data=demo) + geom_freqpoly(mapping=aes(x = value)) 但我想使用数据中预先计算的观测计数 我尝试使用stat=identity,但显然不起作用 ggplot(data=demo) + geom_freqpoly(mapping=aes(x = value, y = cnt), stat = "identity") 这是我的样本数据 demo <- tribb

我可以使用观测的数量毫无问题地绘制geom_freqpoly

ggplot(data=demo) +
 geom_freqpoly(mapping=aes(x = value))
但我想使用数据中预先计算的观测计数

我尝试使用stat=identity,但显然不起作用

ggplot(data=demo) +
 geom_freqpoly(mapping=aes(x = value, y = cnt), stat = "identity")
这是我的样本数据

demo  <- tribble(
 ~value,    ~cnt,
 .25, 20,
 .25, 30,
 .1, 40
)
TL;DR:您没有得到想要的图形,因为您传递给ggplot的预计算计数数据与生成freqpoly图形的数据完全不同

由于您没有包含用于生成图1的原始演示的代码,我冒昧猜测:

demo.orig <- data.frame(value = c(0.25, 0.25, 0.1))

p <- ggplot(demo.orig, aes(x = value)) +
  geom_freqpoly()
p # show plot to verify its appearance, which matches the graph in the question
layer_data(p) # look at the calculated data used by geom_freqpoly

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
   y count          x       xmin       xmax       width   density ncount ndensity PANEL group colour size linetype alpha
1  0     0 0.09310345 0.09051724 0.09568966 0.005172414   0.00000    0.0      0.0     1    -1  black  0.5        1    NA
2  1     1 0.09827586 0.09568966 0.10086207 0.005172414  64.44444    0.5      0.5     1    -1  black  0.5        1    NA
3  0     0 0.10344828 0.10086207 0.10603448 0.005172414   0.00000    0.0      0.0     1    -1  black  0.5        1    NA
... (omitted to conserve space)
30 0     0 0.24310345 0.24051724 0.24568966 0.005172414   0.00000    0.0      0.0     1    -1  black  0.5        1    NA
31 2     2 0.24827586 0.24568966 0.25086207 0.005172414 128.88889    1.0      1.0     1    -1  black  0.5        1    NA
32 0     0 0.25344828 0.25086207 0.25603448 0.005172414   0.00000    0.0      0.0     1    -1  black  0.5        1    NA
通过将geom_freqpoly打印到console进行快速检查,可以看出其底层geom只是GeomPath,它按顺序绘制x/y对

换句话说,如果您想从图1中获得峰值,则需要提供一个类似的数据集,其中的行指示y应该下降到0的位置。虽然可以通过挖掘StatBin$compute_group的代码来计算,但我认为从预先计算的计数数据进行扩展并让ggplot完成其正常工作更简单:

demo %>%
  tidyr::uncount(cnt) %>%
  ggplot(aes(x = value)) + 
  geom_freqpoly() +
  theme_minimal()
编辑:不完全扩展聚合计数的数据帧的解决方案

包含两个组的示例数据集:

demo <- data.frame(value = c(0.25, 0.5, 0.1, 0.25, 0.75, 0.1),
                   cnt = c(5, 2, 4, 3, 8, 7) * 10e8,
                   group = rep(c("a", "b"), each = 3))
代码:


基于类似问题的解决方案似乎简单到在美学中使用权重参数

使用另一个答案中的样本数据的解决方案是

demo <- data.frame(value = c(0.25, 0.5, 0.1, 0.25, 0.75, 0.1),
                   cnt = c(5, 2, 4, 3, 8, 7) * 10e8,
                   group = rep(c("a", "b"), each = 3))


ggplot(demo, aes(value, weight = cnt, color = group)) + geom_freqpoly()  

这就是我跌跌撞撞地回到的问题,等待……这提供了一个预期的结果,但基本上是通过将聚合数据转换回我想要避免的单一观察级别。无论如何,这个答案值得一次投票,尤其是因为提到layer_数据,我不知道也不知道如何找到它。你不介意提供一些关于compute_组的信息,这样我就可以接受答案了吗?请重新运行计数为40M而不是40的示例数据,以了解我的意思。很抱歉,响应太晚,我不知何故没有收到有关更新的通知。非常感谢您对示例代码的详细解释。我认为提到ggplot2:::bin_向量是至关重要的,因为参数权重;这让我找到了解决办法。你不介意回顾一下我的备选答案吗?如果答案完整的话,我可以接受。@dd\u菜鸟这是你问题的解决方案吗?
library(ggplot2)
library(dplyr)

demo %>%
  rename(x = value, y = cnt) %>% # rename here so approach below can be easily applied
                                 # to other datasets with different column names
  tidyr::nest(data = c(x, y)) %>% # nest to apply same approach for each group

  mutate(data = purrr::map(
    data,
    function(d) ggplot2:::bin_vector( # cut x's range into appropriate bins
      x = d$x,
      bins = ggplot2:::bin_breaks_bins(
        x_range = range(d$x),
        bins = 30), # default bin count is 30; change if desired
      pad = TRUE) %>%
      select(x, xmin, xmax) %>%

      # place y counts into the corresponding x bins (this is probably similar
      # to interval join, but I don't have that package installed on my machine)
      tidyr::crossing(d %>% rename(x2 = x)) %>%
      mutate(y = ifelse(x2 >= xmin & x2 < xmax, y, 0)) %>%
      select(-x2) %>%
      group_by(x) %>%
      filter(y == max(y)) %>%
      ungroup() %>%
      unique())) %>%

  tidyr::unnest(cols = c(data)) %>% # unnest to get one flat dataframe back

  ggplot(aes(x = x, y = y, colour = group)) + # plot as per normal
  geom_path() +
  theme_bw()

# package versions used: dplyr 1.0.0, ggplot2 3.3.1, tidyr 1.1.0, purrr 0.3.4
demo <- data.frame(value = c(0.25, 0.5, 0.1, 0.25, 0.75, 0.1),
                   cnt = c(5, 2, 4, 3, 8, 7) * 10e8,
                   group = rep(c("a", "b"), each = 3))


ggplot(demo, aes(value, weight = cnt, color = group)) + geom_freqpoly()