用ggplot将r中的直方图转换为小提琴图_R_Ggplot2_Histogram_Transformation_Violin Plot

用ggplot将r中的直方图转换为小提琴图

用ggplot将r中的直方图转换为小提琴图,r,ggplot2,histogram,transformation,violin-plot,R,Ggplot2,Histogram,Transformation,Violin Plot,目前，我正试图借助哈德利·威克姆（Hadley Wickham）的优秀资源（“数据科学家的r”，“数据分析的ggplot2优雅图形”）学习r。到目前为止，我能在那里找到我所有问题的答案（非常感谢你，哈德利！），但这次不行目前，我正在使用一种仪器的数据，该仪器通过粒子散射的光来估计粒子大小（DLS、Zetasizer Nano、Malvern仪器）。从该设备中提取的数据是一些汇总统计数据（例如，平均粒径）和直方图数据：x=大小（在料仓中分割），y=强度[%]。以下是我的一个测量值： #

目前，我正试图借助哈德利·威克姆（Hadley Wickham）的优秀资源（“数据科学家的r”，“数据分析的ggplot2优雅图形”）学习r。到目前为止，我能在那里找到我所有问题的答案（非常感谢你，哈德利！），但这次不行

目前，我正在使用一种仪器的数据，该仪器通过粒子散射的光来估计粒子大小（DLS、Zetasizer Nano、Malvern仪器）。从该设备中提取的数据是一些汇总统计数据（例如，平均粒径）和直方图数据：x=大小（在料仓中分割），y=强度[%]。以下是我的一个测量值：

   # A tibble: 70 x 3
   sample_name        intensities      bins
   <chr>                    <dbl>     <dbl>
 1 core formulation 1         0       0.4  
 2 core formulation 1         0       0.463
 3 core formulation 1         0       0.536
 4 core formulation 1         0       0.621
 5 core formulation 1         0       0.720
 6 core formulation 1         0       0.833
 7 core formulation 1         0       0.965
 8 core formulation 1         0       1.12 
 9 core formulation 1         0       1.29 
10 core formulation 1         0       1.50 
11 core formulation 1         0       1.74 
12 core formulation 1         0       2.01 
13 core formulation 1         0       2.33 
14 core formulation 1         0       2.70 
15 core formulation 1         0       3.12 
16 core formulation 1         0       3.62 
17 core formulation 1         0       4.19 
18 core formulation 1         0       4.85 
19 core formulation 1         0       5.62 
20 core formulation 1         0       6.50 
21 core formulation 1         0       7.53 
22 core formulation 1         0       8.72 
23 core formulation 1         0      10.1  
24 core formulation 1         0      11.7  
25 core formulation 1         0      13.5  
26 core formulation 1         0      15.7  
27 core formulation 1         0      18.2  
28 core formulation 1         0      21.0  
29 core formulation 1         0      24.4  
30 core formulation 1         0      28.2  
31 core formulation 1         0      32.7  
32 core formulation 1         0      37.8  
33 core formulation 1         0      43.8  
34 core formulation 1         0.2    50.8  
35 core formulation 1         1.4    58.8  
36 core formulation 1         3.7    68.1  
37 core formulation 1         6.9    78.8  
38 core formulation 1        10.2    91.3  
39 core formulation 1        12.9   106.   
40 core formulation 1        14.4   122.   
41 core formulation 1        14.4   142.   
42 core formulation 1        13     164.   
43 core formulation 1        10.3   190.   
44 core formulation 1         7.1   220.   
45 core formulation 1         3.9   255    
46 core formulation 1         1.5   295.   
47 core formulation 1         0.2   342    
48 core formulation 1         0     396.   
49 core formulation 1         0     459.   
50 core formulation 1         0     531.   
51 core formulation 1         0     615.   
52 core formulation 1         0     712.   
53 core formulation 1         0     825    
54 core formulation 1         0     955.   
55 core formulation 1         0    1106    
56 core formulation 1         0    1281    
57 core formulation 1         0    1484    
58 core formulation 1         0    1718    
59 core formulation 1         0    1990    
60 core formulation 1         0    2305    
61 core formulation 1         0    2669    
62 core formulation 1         0    3091    
63 core formulation 1         0    3580    
64 core formulation 1         0    4145    
65 core formulation 1         0    4801    
66 core formulation 1         0    5560    
67 core formulation 1         0    6439    
68 core formulation 1         0    7456    
69 core formulation 1         0    8635    
70 core formulation 1         0   10000

我可以从这些数据中生成一个没有问题的直方图：

library(tidyverse)
ggplot (DLS_intensities_core, aes(bins,intensities) ) + 
  geom_line() + 
  scale_x_continuous(trans = 'log10')

为了显示我的颗粒大小的总体分布，我想将这些数据转换成小提琴图，并在我的图的第二层使用设备提供的汇总统计数据

因此，我想转换这些数据，以便能够从中创建一个小提琴图

我已经尝试过将它输入小提琴情节的stat_density（）参数，但到目前为止没有成功

你知道如何从这些数据中创建小提琴图吗

多谢各位

最好的

多米尼克

在您回复第二条评论后，我将更新此内容（如果需要）。您可以使用以下工具获得

箱子

和

强度

的小提琴图：

library(hrbrthemes)

gather(DLS_intensities_core, measure, value, -sample_name) %>% 
  ggplot(aes(measure, value)) +
  geom_violin(scale = "count") +
  scale_y_comma() +
  facet_wrap(~measure, scales="free") +
  labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
  ) +
  theme_ipsum_rc(grid="Y") +
  theme(axis.text.x = element_blank())

我通常也喜欢在点中分层：

gather(DLS_intensities_core, measure, value, -sample_name) %>% 
  ggplot(aes(measure, value)) +
  geom_violin(scale = "count") +
  ggbeeswarm::geom_quasirandom() +
  scale_y_comma() +
  facet_wrap(~measure, scales="free") +
  labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
  ) +
  theme_ipsum_rc(grid="Y") +
  theme(axis.text.x = element_blank())

根据您的评论，这可能是显示

垃圾箱

分布以及与

强度的关系的更好方法：
library(hrbrthemes)
library(tidyverse)

ggplot(DLS_intensities_core, aes(x="", bins)) +
  geom_violin(scale = "count") +
  ggbeeswarm::geom_quasirandom(
    aes(size = intensities, fill = intensities), shape = 21
  ) +
  scale_y_comma(trans="log10") +
  viridis::scale_fill_viridis(direction = -1, trans = "log1p") +
  scale_size_continuous(trans = "log1p", range = c(2, 10)) +
  guides(fill = guide_legend()) +
  labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this"
  ) +
  theme_ipsum_rc(grid="Y")


你必须做一些其他的、自定义的变换，以使小提琴的形状随着强度的变化而变化（而这并不能真正反映出此时的分布）。
我找到了解决问题的方法，它可能不是很优雅：
library (tidyverse)

DLS_intensities_core <- DLS_intensities_core %>% 
  mutate(counts = intensities * 10 )

vectors <- DLS_intensities_core %>%
  filter(counts > 0) 

bins_v <- vectors$bins
count_v <- vectors$counts

violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))
violin_DLSdata$sample_name <- "core formulation 1"

ggplot (violin_DLSdata, aes(sample_name, value)) + 
  geom_violin() + 
  labs(
    x = NULL, y = "size"
  ) +
  scale_y_continuous(trans = 'log10', limits = c(1, 1000))

库（tidyverse）
DLS_强度_核心%
突变（计数=强度*10）
矢量%
过滤器（计数>0）
欢迎来到SO！你能跳过去看看关于如何发布数据的建议吗（有些在该URL的链接中）？我认为如果数据的格式稍微简单一点的话，SoR贡献者可以很快完成这项工作。很抱歉，我会立即添加它！谢谢你指出这一点！哇！最好的数据转储的一天在这里！（而且，不需要道歉。SO问题输入屏幕还有很多需要改进的地方）。你是在寻找DLS\u intensities\u core$bins
和DLS\u intensities\u core$intensities
的小提琴图，还是需要对它们执行一些其他转换？谢谢：-）我想我只是在寻找DLS\u intensities\u core$bins
和DLS\u intensities\u core$intensities的小提琴图。最后，我希望生成一个具有以下规格的绘图：x轴：粒度[nm]；y轴：样品名称（如“核心配方”）；第一。图层：小提琴图（根据我在这里提供的数据）；第二。图层：汇总统计数据（由Zetasizer计算，为清晰起见，此处未显示），这意味着我肯定需要进行一些额外的转换，但我想先自己尝试；-）非常感谢你的情节和相关代码，我非常喜欢你情节的主题！在小提琴图中添加点是一个好主意：-）第一个图（箱子）非常接近我想要的：y轴上的大小（或箱子）（log10比例），在x方向上的扩展是强度值（而不是案例数量）。这有帮助吗？或者这只是一个错误的想法？看看最新的更新是否做了一点你想要的。是的，这是在正确的方向！但是，我希望通过小提琴图在x方向上的延伸来描述强度（小提琴图的最大延伸在y轴上大约100处，在打印大点的位置）洗澡的时候，我有另一个想法：也许我的tibble格式不适合这种类型的情节。由于Vionior plot的统计数据统计行数并描述摘要，最直接的方法是将存储单元大小乘以强度，并创建具有此精确（存储单元）大小的相应行数。例如，如果大小为100，相应的强度为20，那么我将创建20个新的行，bin=100。今天我将尝试这种方法，并让您知道我的发现：-）再次感谢您的帮助！-）
library (tidyverse)

DLS_intensities_core <- DLS_intensities_core %>% 
  mutate(counts = intensities * 10 )

vectors <- DLS_intensities_core %>%
  filter(counts > 0) 

bins_v <- vectors$bins
count_v <- vectors$counts

violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))
violin_DLSdata$sample_name <- "core formulation 1"

ggplot (violin_DLSdata, aes(sample_name, value)) + 
  geom_violin() + 
  labs(
    x = NULL, y = "size"
  ) +
  scale_y_continuous(trans = 'log10', limits = c(1, 1000))