Python 3.x 在同一绘图上绘制多条密度曲线：在Python 3中对子集类别进行加权_Python 3.x_Matplotlib_Plot_Seaborn_Density Plot

Python 3.x 在同一绘图上绘制多条密度曲线：在Python 3中对子集类别进行加权

python-3.x matplotlib plot

Python 3.x 在同一绘图上绘制多条密度曲线：在Python 3中对子集类别进行加权,python-3.x,matplotlib,plot,seaborn,density-plot,Python 3.x,Matplotlib,Plot,Seaborn,Density Plot,我试图在python 3中重新创建这个密度图：math.stackexchange.com/questions/845424/the-expected-output-of-a-random-game-of-chess 蓝色曲线下的面积等于红色、绿色和紫色曲线组合的面积，因为不同的结果（平局、黑色胜利和白色胜利）是总数（全部）的子集我如何让python实现并相应地绘制它以下是1000次模拟后的结果的.csv文件_dfpastebin.com/YDVMx2DL from matplotlib

我试图在python 3中重新创建这个密度图：math.stackexchange.com/questions/845424/the-expected-output-of-a-random-game-of-chess

蓝色曲线下的面积等于红色、绿色和紫色曲线组合的面积，因为不同的结果（平局、黑色胜利和白色胜利）是总数（全部）的子集

我如何让python实现并相应地绘制它

以下是1000次模拟后的结果的.csv文件_dfpastebin.com/YDVMx2DL

from matplotlib import pyplot as plt
import seaborn as sns

black = results_df.loc[results_df['outcome'] == 'Black']
white = results_df.loc[results_df['outcome'] == 'White']
draw = results_df.loc[results_df['outcome'] == 'Draw']
win = results_df.loc[results_df['outcome'] != 'Draw']

Total = len(results_df.index)
Wins = len(win.index)

PercentBlack = "Black Wins ≈ %s" %('{0:.2%}'.format(len(black.index)/Total))
PercentWhite = "White Wins ≈ %s" %('{0:.2%}'.format(len(white.index)/Total))
PercentDraw = "Draw ≈ %s" %('{0:.2%}'.format(len(draw.index)/Total))
AllTitle = 'Distribution of Moves by All Outcomes (nSample = %s)' %(workers)

sns.distplot(results_df.moves, hist=False, label = "All")
sns.distplot(black.moves, hist=False, label=PercentBlack)
sns.distplot(white.moves, hist=False, label=PercentWhite)
sns.distplot(draw.moves, hist=False, label=PercentDraw)
plt.title(AllTitle)
plt.ylabel('Density')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()

上面的代码生成没有权重的密度曲线，我真的需要弄清楚如何相应地生成密度曲线权重，以及在图例中保留我的标签

我还尝试了频率直方图，它正确地缩放了分布高度，但我宁愿将4条曲线重叠在一起，以获得“更干净”的外观…我不喜欢这个频率图，但这是我目前的解决方案

results_df.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = "All") draw.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentDraw) white.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentWhite) black.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentBlack) plt.title(AllTitle) plt.ylabel('Frequency') plt.xlabel('Number of Moves') plt.legend() plt.show()
如果任何人都能编写python 3代码，输出带有4个具有正确子集权重的密度曲线的第一个绘图，并保留显示百分比的自定义图例，那将非常感谢
一旦使用正确的子集权重绘制了密度曲线，我还对python 3代码感兴趣，该代码在中查找每个密度曲线的最大点坐标，该坐标显示了将其放大到500000次迭代后的最大移动频率

谢谢
你需要小心。你绘制的图是正确的。所有显示的曲线都是潜在分布的概率密度函数
在您想要的绘图中，只有标有“All”的曲线是概率密度函数。其他曲线不是
在任何情况下，如果您想按所需绘图中所示的方式对其进行缩放，则需要自己计算内核密度估计值。这可以通过使用
为了重现所需的绘图，我看到两个选项
计算所有相关案例的kde，并根据样本数量进行缩放。

import numpy as np; np.random.seed(0) import matplotlib.pyplot as plt import scipy.stats a = np.random.gumbel(80, 25, 1000).astype(int) b = np.random.gumbel(200, 46, 4000).astype(int) kdea = scipy.stats.gaussian_kde(a) kdeb = scipy.stats.gaussian_kde(b) both = np.hstack((a,b)) kdeboth = scipy.stats.gaussian_kde(both) grid = np.arange(500) #weighted kde curves wa = kdea(grid)*(len(a)/float(len(both))) wb = kdeb(grid)*(len(b)/float(len(both))) print "a.sum ", wa.sum() print "b.sum ", wb.sum() print "total.sum ", kdeb(grid).sum() fig, ax = plt.subplots() ax.plot(grid, wa, lw=1, label = "weighted a") ax.plot(grid, wb, lw=1, label = "weighted b") ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf") plt.legend() plt.show()

import numpy as np; np.random.seed(0) import matplotlib.pyplot as plt import scipy.stats a = np.random.gumbel(80, 25, 1000).astype(int) b = np.random.gumbel(200, 46, 4000).astype(int) kdea = scipy.stats.gaussian_kde(a) kdeb = scipy.stats.gaussian_kde(b) grid = np.arange(500) #weighted kde curves wa = kdea(grid)*(len(a)/float(len(a)+len(b))) wb = kdeb(grid)*(len(b)/float(len(a)+len(b))) total = wa+wb fig, ax = plt.subplots(figsize=(5,3)) ax.plot(grid, wa, lw=1, label = "weighted a") ax.plot(grid, wb, lw=1, label = "weighted b") ax.plot(grid, total, color="crimson", lw=2, label = "pdf") plt.legend() plt.show()

计算所有个案的kde，将其总和标准化以获得总数。

import numpy as np; np.random.seed(0) import matplotlib.pyplot as plt import scipy.stats a = np.random.gumbel(80, 25, 1000).astype(int) b = np.random.gumbel(200, 46, 4000).astype(int) kdea = scipy.stats.gaussian_kde(a) kdeb = scipy.stats.gaussian_kde(b) both = np.hstack((a,b)) kdeboth = scipy.stats.gaussian_kde(both) grid = np.arange(500) #weighted kde curves wa = kdea(grid)*(len(a)/float(len(both))) wb = kdeb(grid)*(len(b)/float(len(both))) print "a.sum ", wa.sum() print "b.sum ", wb.sum() print "total.sum ", kdeb(grid).sum() fig, ax = plt.subplots() ax.plot(grid, wa, lw=1, label = "weighted a") ax.plot(grid, wb, lw=1, label = "weighted b") ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf") plt.legend() plt.show()

import numpy as np; np.random.seed(0) import matplotlib.pyplot as plt import scipy.stats a = np.random.gumbel(80, 25, 1000).astype(int) b = np.random.gumbel(200, 46, 4000).astype(int) kdea = scipy.stats.gaussian_kde(a) kdeb = scipy.stats.gaussian_kde(b) grid = np.arange(500) #weighted kde curves wa = kdea(grid)*(len(a)/float(len(a)+len(b))) wb = kdeb(grid)*(len(b)/float(len(a)+len(b))) total = wa+wb fig, ax = plt.subplots(figsize=(5,3)) ax.plot(grid, wa, lw=1, label = "weighted a") ax.plot(grid, wb, lw=1, label = "weighted b") ax.plot(grid, total, color="crimson", lw=2, label = "pdf") plt.legend() plt.show()