Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 在同一绘图上绘制多条密度曲线:在Python 3中对子集类别进行加权_Python 3.x_Matplotlib_Plot_Seaborn_Density Plot - Fatal编程技术网

Python 3.x 在同一绘图上绘制多条密度曲线:在Python 3中对子集类别进行加权

Python 3.x 在同一绘图上绘制多条密度曲线:在Python 3中对子集类别进行加权,python-3.x,matplotlib,plot,seaborn,density-plot,Python 3.x,Matplotlib,Plot,Seaborn,Density Plot,我试图在python 3中重新创建这个密度图:math.stackexchange.com/questions/845424/the-expected-output-of-a-random-game-of-chess 蓝色曲线下的面积等于红色、绿色和紫色曲线组合的面积,因为不同的结果(平局、黑色胜利和白色胜利)是总数(全部)的子集 我如何让python实现并相应地绘制它 以下是1000次模拟后的结果的.csv文件_dfpastebin.com/YDVMx2DL from matplotlib

我试图在python 3中重新创建这个密度图:math.stackexchange.com/questions/845424/the-expected-output-of-a-random-game-of-chess

蓝色曲线下的面积等于红色、绿色和紫色曲线组合的面积,因为不同的结果(平局、黑色胜利和白色胜利)是总数(全部)的子集

我如何让python实现并相应地绘制它

以下是1000次模拟后的结果的.csv文件_dfpastebin.com/YDVMx2DL

from matplotlib import pyplot as plt
import seaborn as sns

black = results_df.loc[results_df['outcome'] == 'Black']
white = results_df.loc[results_df['outcome'] == 'White']
draw = results_df.loc[results_df['outcome'] == 'Draw']
win = results_df.loc[results_df['outcome'] != 'Draw']

Total = len(results_df.index)
Wins = len(win.index)

PercentBlack = "Black Wins ≈ %s" %('{0:.2%}'.format(len(black.index)/Total))
PercentWhite = "White Wins ≈ %s" %('{0:.2%}'.format(len(white.index)/Total))
PercentDraw = "Draw ≈ %s" %('{0:.2%}'.format(len(draw.index)/Total))
AllTitle = 'Distribution of Moves by All Outcomes (nSample = %s)' %(workers)

sns.distplot(results_df.moves, hist=False, label = "All")
sns.distplot(black.moves, hist=False, label=PercentBlack)
sns.distplot(white.moves, hist=False, label=PercentWhite)
sns.distplot(draw.moves, hist=False, label=PercentDraw)
plt.title(AllTitle)
plt.ylabel('Density')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()
上面的代码生成没有权重的密度曲线,我真的需要弄清楚如何相应地生成密度曲线权重,以及在图例中保留我的标签

我还尝试了频率直方图,它正确地缩放了分布高度,但我宁愿将4条曲线重叠在一起,以获得“更干净”的外观…我不喜欢这个频率图,但这是我目前的解决方案

results_df.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = "All")
draw.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentDraw)
white.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentWhite)
black.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentBlack)
plt.title(AllTitle)
plt.ylabel('Frequency')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()
如果任何人都能编写python 3代码,输出带有4个具有正确子集权重的密度曲线的第一个绘图,并保留显示百分比的自定义图例,那将非常感谢

一旦使用正确的子集权重绘制了密度曲线,我还对python 3代码感兴趣,该代码在中查找每个密度曲线的最大点坐标,该坐标显示了将其放大到500000次迭代后的最大移动频率


谢谢

你需要小心。你绘制的图是正确的。所有显示的曲线都是潜在分布的概率密度函数

在您想要的绘图中,只有标有“All”的曲线是概率密度函数。其他曲线不是

在任何情况下,如果您想按所需绘图中所示的方式对其进行缩放,则需要自己计算内核密度估计值。这可以通过使用

为了重现所需的绘图,我看到两个选项

计算所有相关案例的kde,并根据样本数量进行缩放。

import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

both = np.hstack((a,b))
kdeboth = scipy.stats.gaussian_kde(both)
grid = np.arange(500)

#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(both)))
wb = kdeb(grid)*(len(b)/float(len(both)))

print "a.sum ", wa.sum()
print "b.sum ", wb.sum()
print "total.sum ", kdeb(grid).sum()

fig, ax = plt.subplots()
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf")

plt.legend()
plt.show()
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

grid = np.arange(500)


#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(a)+len(b)))
wb = kdeb(grid)*(len(b)/float(len(a)+len(b)))

total = wa+wb

fig, ax = plt.subplots(figsize=(5,3))
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, total, color="crimson", lw=2, label = "pdf")

plt.legend()
plt.show()

计算所有个案的kde,将其总和标准化以获得总数。

import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

both = np.hstack((a,b))
kdeboth = scipy.stats.gaussian_kde(both)
grid = np.arange(500)

#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(both)))
wb = kdeb(grid)*(len(b)/float(len(both)))

print "a.sum ", wa.sum()
print "b.sum ", wb.sum()
print "total.sum ", kdeb(grid).sum()

fig, ax = plt.subplots()
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf")

plt.legend()
plt.show()
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats

a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)

kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)

grid = np.arange(500)


#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(a)+len(b)))
wb = kdeb(grid)*(len(b)/float(len(a)+len(b)))

total = wa+wb

fig, ax = plt.subplots(figsize=(5,3))
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, total, color="crimson", lw=2, label = "pdf")

plt.legend()
plt.show()