Python 3.x 在同一绘图上绘制多条密度曲线:在Python 3中对子集类别进行加权
我试图在python 3中重新创建这个密度图:math.stackexchange.com/questions/845424/the-expected-output-of-a-random-game-of-chess 蓝色曲线下的面积等于红色、绿色和紫色曲线组合的面积,因为不同的结果(平局、黑色胜利和白色胜利)是总数(全部)的子集 我如何让python实现并相应地绘制它 以下是1000次模拟后的结果的.csv文件_dfpastebin.com/YDVMx2DLPython 3.x 在同一绘图上绘制多条密度曲线:在Python 3中对子集类别进行加权,python-3.x,matplotlib,plot,seaborn,density-plot,Python 3.x,Matplotlib,Plot,Seaborn,Density Plot,我试图在python 3中重新创建这个密度图:math.stackexchange.com/questions/845424/the-expected-output-of-a-random-game-of-chess 蓝色曲线下的面积等于红色、绿色和紫色曲线组合的面积,因为不同的结果(平局、黑色胜利和白色胜利)是总数(全部)的子集 我如何让python实现并相应地绘制它 以下是1000次模拟后的结果的.csv文件_dfpastebin.com/YDVMx2DL from matplotlib
from matplotlib import pyplot as plt
import seaborn as sns
black = results_df.loc[results_df['outcome'] == 'Black']
white = results_df.loc[results_df['outcome'] == 'White']
draw = results_df.loc[results_df['outcome'] == 'Draw']
win = results_df.loc[results_df['outcome'] != 'Draw']
Total = len(results_df.index)
Wins = len(win.index)
PercentBlack = "Black Wins ≈ %s" %('{0:.2%}'.format(len(black.index)/Total))
PercentWhite = "White Wins ≈ %s" %('{0:.2%}'.format(len(white.index)/Total))
PercentDraw = "Draw ≈ %s" %('{0:.2%}'.format(len(draw.index)/Total))
AllTitle = 'Distribution of Moves by All Outcomes (nSample = %s)' %(workers)
sns.distplot(results_df.moves, hist=False, label = "All")
sns.distplot(black.moves, hist=False, label=PercentBlack)
sns.distplot(white.moves, hist=False, label=PercentWhite)
sns.distplot(draw.moves, hist=False, label=PercentDraw)
plt.title(AllTitle)
plt.ylabel('Density')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()
上面的代码生成没有权重的密度曲线,我真的需要弄清楚如何相应地生成密度曲线权重,以及在图例中保留我的标签
我还尝试了频率直方图,它正确地缩放了分布高度,但我宁愿将4条曲线重叠在一起,以获得“更干净”的外观…我不喜欢这个频率图,但这是我目前的解决方案
results_df.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = "All")
draw.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentDraw)
white.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentWhite)
black.moves.hist(alpha=0.4, bins=range(0, 700, 10), label = PercentBlack)
plt.title(AllTitle)
plt.ylabel('Frequency')
plt.xlabel('Number of Moves')
plt.legend()
plt.show()
如果任何人都能编写python 3代码,输出带有4个具有正确子集权重的密度曲线的第一个绘图,并保留显示百分比的自定义图例,那将非常感谢
一旦使用正确的子集权重绘制了密度曲线,我还对python 3代码感兴趣,该代码在中查找每个密度曲线的最大点坐标,该坐标显示了将其放大到500000次迭代后的最大移动频率
谢谢你需要小心。你绘制的图是正确的。所有显示的曲线都是潜在分布的概率密度函数 在您想要的绘图中,只有标有“All”的曲线是概率密度函数。其他曲线不是 在任何情况下,如果您想按所需绘图中所示的方式对其进行缩放,则需要自己计算内核密度估计值。这可以通过使用 为了重现所需的绘图,我看到两个选项 计算所有相关案例的kde,并根据样本数量进行缩放。
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats
a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
both = np.hstack((a,b))
kdeboth = scipy.stats.gaussian_kde(both)
grid = np.arange(500)
#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(both)))
wb = kdeb(grid)*(len(b)/float(len(both)))
print "a.sum ", wa.sum()
print "b.sum ", wb.sum()
print "total.sum ", kdeb(grid).sum()
fig, ax = plt.subplots()
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf")
plt.legend()
plt.show()
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats
a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
grid = np.arange(500)
#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(a)+len(b)))
wb = kdeb(grid)*(len(b)/float(len(a)+len(b)))
total = wa+wb
fig, ax = plt.subplots(figsize=(5,3))
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, total, color="crimson", lw=2, label = "pdf")
plt.legend()
plt.show()
计算所有个案的kde,将其总和标准化以获得总数。
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats
a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
both = np.hstack((a,b))
kdeboth = scipy.stats.gaussian_kde(both)
grid = np.arange(500)
#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(both)))
wb = kdeb(grid)*(len(b)/float(len(both)))
print "a.sum ", wa.sum()
print "b.sum ", wb.sum()
print "total.sum ", kdeb(grid).sum()
fig, ax = plt.subplots()
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, kdeboth(grid), color="crimson", lw=2, label = "pdf")
plt.legend()
plt.show()
import numpy as np; np.random.seed(0)
import matplotlib.pyplot as plt
import scipy.stats
a = np.random.gumbel(80, 25, 1000).astype(int)
b = np.random.gumbel(200, 46, 4000).astype(int)
kdea = scipy.stats.gaussian_kde(a)
kdeb = scipy.stats.gaussian_kde(b)
grid = np.arange(500)
#weighted kde curves
wa = kdea(grid)*(len(a)/float(len(a)+len(b)))
wb = kdeb(grid)*(len(b)/float(len(a)+len(b)))
total = wa+wb
fig, ax = plt.subplots(figsize=(5,3))
ax.plot(grid, wa, lw=1, label = "weighted a")
ax.plot(grid, wb, lw=1, label = "weighted b")
ax.plot(grid, total, color="crimson", lw=2, label = "pdf")
plt.legend()
plt.show()