Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/batch-file/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R ggplot2数据和比例的日志转换_R_Plot_Ggplot2_Distribution_Data Visualization - Fatal编程技术网

R ggplot2数据和比例的日志转换

R ggplot2数据和比例的日志转换,r,plot,ggplot2,distribution,data-visualization,R,Plot,Ggplot2,Distribution,Data Visualization,这是我上一个问题的后续,我昨天已经回答了自己。我当前的问题是,在下面的可再现示例中,用于绘制数据值混合分布成分的线既不出现在预期位置,也不具有预期形状,如下所示(参见第二幅图中y=0处的红线) 完整的可复制示例: library(ggplot2) library(scales) library(RColorBrewer) library(mixtools) NUM_COMPONENTS <- 2 set.seed(12345) # for reproducibility data

这是我上一个问题的后续,我昨天已经回答了自己。我当前的问题是,在下面的可再现示例中,用于绘制数据值混合分布成分的线既不出现在预期位置,也不具有预期形状,如下所示(参见第二幅图中y=0处的红线)

完整的可复制示例

library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS <- 2

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- diamonds$price

# extract 'k' components from mixed distribution 'data'
mix.info <- normalmixEM(myData, k = NUM_COMPONENTS,
                        maxit = 100, epsilon = 0.01)
summary(mix.info)

numComponents <- length(mix.info$sigma)
message("Extracted number of component distributions: ",
        numComponents)

calc.components <- function(x, mix, comp.number) {

  mix$lambda[comp.number] *
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number])
}

g <- ggplot(data.frame(x = myData)) +
  scale_fill_continuous("Count", low="#56B1F7", high="#132B43") + 
  scale_x_log10("Diamond Price [log10]",
                breaks = trans_breaks("log10", function(x) 10^x),
                labels = prettyNum) +
  scale_y_continuous("Count") +
  geom_histogram(aes(x = myData, fill = 0.01 * ..density..),
                 binwidth = 0.01)
print(g)

# we could select needed number of colors randomly:
#DISTRIB_COLORS <- sample(colors(), numComponents)

# or, better, use a palette with more color differentiation:
DISTRIB_COLORS <- brewer.pal(numComponents, "Set1")

distComps <- lapply(seq(numComponents), function(i)
  stat_function(fun = calc.components,
                arg = list(mix = mix.info, comp.number = i),
                geom = "line", # use alpha=.5 for "polygon"
                size = 1,
                color = "red")) # DISTRIB_COLORS[i]
print(g + distComps)
库(ggplot2)
图书馆(比例尺)
图书馆(RColorBrewer)
图书馆(混合工具)

NUM_COMPONENTS最后,我已经解决了问题,删除了我以前的答案,并在下面提供了我的最新解决方案(我唯一没有解决的是组件的图例面板-它不是出于某种原因出现的,但对于
EDA
,为了证明混合分布的存在,我认为它已经足够好了)。完整的可复制溶液如下。感谢所有直接或间接帮助过我们的人

library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS <- 2

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- diamonds$price


calc.components <- function(x, mix, comp.number) {

  mix$lambda[comp.number] *
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number])
}


overlayHistDensity <- function(data, calc.comp.fun) {

  # extract 'k' components from mixed distribution 'data'
  mix.info <- normalmixEM(data, k = NUM_COMPONENTS,
                          maxit = 100, epsilon = 0.01)
  summary(mix.info)

  numComponents <- length(mix.info$sigma)
  message("Extracted number of component distributions: ",
          numComponents)

  DISTRIB_COLORS <- 
    suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))

  # create (plot) histogram and ...
  g <- ggplot(as.data.frame(data), aes(x = data)) +
    geom_histogram(aes(y = ..density..),
                   binwidth = 0.01, alpha = 0.5) +
    theme(legend.position = 'top', legend.direction = 'horizontal')

  comp.labels <- lapply(seq(numComponents),
                        function (i) paste("Component", i))

  # ... fitted densities of components
  distComps <- lapply(seq(numComponents), function (i)
    stat_function(fun = calc.comp.fun,
                  args = list(mix = mix.info, comp.number = i),
                  size = 2, color = DISTRIB_COLORS[i]))

  legend <- list(scale_colour_manual(name = "Legend:",
                                     values = DISTRIB_COLORS,
                                     labels = unlist(comp.labels)))

  return (g + distComps + legend)
}

overlayPlot <- overlayHistDensity(log10(myData), 'calc.components')
print(overlayPlot)
库(ggplot2)
图书馆(比例尺)
图书馆(RColorBrewer)
图书馆(混合工具)

NUM_COMPONENTS刚刚意识到,对于这个问题和前面的问题,我可能需要将组件分布数据值乘以每个组件分布中的元素总数(在我们的例子中,它们相等),以便从密度分布转移到计数分布。如果它有意义,那么我应该如何使用
stat\u function()
?我想,通过在
calc.components
函数中添加一个乘数作为相应的参数,在
stat\u函数的
arg
列表中添加一个相应的参数,我否决了这个问题,因为它太冗长,粗体字会降低可读性。请让你的问题更切题。此外,您承认我们需要最少的可复制示例。请尝试创建一个。@Roland:我对否决票没有问题,只要它像你刚才那样被证实。对不起,粗体字-我试图强调重要的元素/要点。将限制其在未来的使用,并将尝试提供更紧凑的问题。关于可复制的示例,我刚刚创建了一个,不久将用它更新我的问题。谢谢你的帮助@托尼托诺夫:谢谢,我会记住这一点。在接下来的5分钟内用可复制的示例更新我的问题…谢谢编辑。现在好多了。