Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/flash/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何向matplotlib批注添加其他文本_Python_Pandas_Matplotlib_Seaborn - Fatal编程技术网

Python 如何向matplotlib批注添加其他文本

Python 如何向matplotlib批注添加其他文本,python,pandas,matplotlib,seaborn,Python,Pandas,Matplotlib,Seaborn,我使用seaborn的titanic数据集作为我的大型数据集的代理,以创建基于该数据集的图表和数据 以下代码运行时没有任何错误: import seaborn as sns import pandas as pd import numpy as np sns.set_theme(style="darkgrid") # Load the example Titanic dataset df = sns.load_dataset("titanic") #

我使用seaborn的titanic数据集作为我的大型数据集的代理,以创建基于该数据集的图表和数据

以下代码运行时没有任何错误:

import seaborn as sns
import pandas as pd
import numpy as np
sns.set_theme(style="darkgrid")

# Load the example Titanic dataset
df = sns.load_dataset("titanic")

# split fare into decile groups and order them
df['fare_grp'] = pd.qcut(df['fare'], q=10,labels=None, retbins=False, precision=0).astype(str)
df.groupby(['fare_grp'],dropna=False).size()
df['fare_grp_num'] = pd.qcut(df['fare'], q=10,labels=False, retbins=False, precision=0).astype(str)
df.groupby(['fare_grp_num'],dropna=False).size()
df['fare_ord_grp'] = df['fare_grp_num'] + ' ' +df['fare_grp']
df['fare_ord_grp']

# set variables
target = 'survived'
ydim = 'fare_ord_grp'
xdim = 'embark_town'

#del [result]

non_events = pd.DataFrame(df[df[target]==0].groupby([ydim,xdim],as_index=False, dropna=False)[target].count()).rename(columns={target: 'non_events'})
non_events[xdim]=non_events[xdim].replace(np.nan, 'Missing', regex=True)
non_events[ydim]=non_events[ydim].replace(np.nan, 'Missing', regex=True)
non_events_total = pd.DataFrame(df[df[target]==0].groupby([xdim],dropna=False,as_index=False)[target].count()).rename(columns={target: 'non_events_total_by_xdim'}).replace(np.nan, 'Missing', regex=True)

events = pd.DataFrame(df[df[target]==1].groupby([ydim,xdim],as_index=False, dropna=False)[target].count()).rename(columns={target: 'events'})
events[xdim]=events[xdim].replace(np.nan, 'Missing', regex=True)
events[ydim]=events[ydim].replace(np.nan, 'Missing', regex=True)
events_total = pd.DataFrame(df[df[target]==1].groupby([xdim],dropna=False,as_index=False)[target].count()).rename(columns={target: 'events_total_by_xdim'}).replace(np.nan, 'Missing', regex=True)

grand_total = pd.DataFrame(df.groupby([xdim],dropna=False,as_index=False)[target].count()).rename(columns={target: 'total_by_xdim'}).replace(np.nan, 'Missing', regex=True)

grand_total=grand_total.merge(non_events_total, how='left', on=xdim).merge(events_total, how='left', on=xdim)

result = pd.merge(non_events, events, how="outer",on=[ydim,xdim])

result['total'] = result['non_events'].fillna(0) + result['events'].fillna(0)
result[xdim] = result[xdim].replace(np.nan, 'Missing', regex=True)
result = pd.merge(result, grand_total, how="left",on=[xdim])

result['survival rate %'] = round(result['events']/result['total']*100,2)
result['% event dist by xdim'] = round(result['events']/result['events_total_by_xdim']*100,2)
result['% non-event dist by xdim'] = round(result['non_events']/result['non_events_total_by_xdim']*100,2)
result['% total dist by xdim'] = round(result['total']/result['total_by_xdim']*100,2)

display(result)
value_name1 = "% dist by " + str(xdim)
dfl = pd.melt(result, id_vars=[ydim, xdim],value_vars =['% total dist by xdim'], var_name = 'Type',value_name=value_name1).drop(columns='Type')
dfl2 = dfl.pivot(index=ydim, columns=xdim, values=value_name1)
print(dfl2)
title1 = "% dist by " + str(xdim)
ax=dfl2.T.plot(kind='bar', stacked=True, rot=1, figsize=(8, 8), title=title1)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
ax.legend(bbox_to_anchor=(1.0, 1.0),title = 'Fare Range')
ax.set_ylabel('% Dist')
for p in ax.patches:
    width, height = p.get_width(), p.get_height()
    x, y = p.get_xy() 
    ax.text(x+width/2, y+height/2,'{:.0f}%'.format(height),horizontalalignment='center', verticalalignment='center')
它生成以下堆积百分比条形图,显示了按出发城镇划分的总分布百分比

我还想显示存活率以及每个区块的百分比分布。例如,对于皇后镇,票价范围1(7.6,7.9%),总分布百分比为56%。我想将生存率37.21%显示为(56%,37.21%)。我无法计算。请提供任何建议。谢谢

以下是输出汇总表,以供参考

车费 登船进城 非公开活动 事件 全部的 按xdim计算的总计 非事件\u总计\u按\u xdim 事件总数按xdim计算 存活率% %xdim的事件距离 %xdim的非事件距离 %按xdim计算的总距离 0 0 (-0.1,7.6] 瑟堡 22 7. 29 168 75 93 24.14 7.53 29.33 17.26 1. 0 (-0.1,7.6] 昆斯敦 4. 楠 4. 77 47 30 楠 楠 8.51 5.19 2. 0 (-0.1,7.6] 南安普顿 53 6. 59 644 427 217 10.17 2.76 12.41 9.16 3. 1 (7.6,7.9] 昆斯敦 27 16 43 77 47 30 37.21 53.33 57.45 55.84 4. 1 (7.6,7.9] 南安普顿 34 10 44 644 427 217 22.73 4.61 7.96 6.83 5. 2 (7.9,8] 瑟堡 4. 1. 5. 168 75 93 20 1.08 5.33 2.98 6. 2 (7.9,8] 南安普顿 83 13 96 644 427 217 13.54 5.99 19.44 14.91 7. 3 (8.0,10.5] 瑟堡 2. 1. 3. 168 75 93 33.33 1.08 2.67 1.79 8. 3 (8.0,10.5] 昆斯敦 2. 楠 2. 77 47 30 楠 楠 4.26 2.6 9 3 (8.0,10.5] 南安普顿 56 17 73 644 427 217 23.29 7.83 13.11 11.34 10 4 (10.5,14.5] 瑟堡 7. 8. 15 168 75 93 53.33 8.6 9.33 8.93 11 4 (10.5,14.5] 昆斯敦 1. 2. 3. 77 47 30 66.67 6.67 2.13 3.9 12 4 (10.5,14.5] 南安普顿 40 26 66 644 427 217 39.39 11.98 9.37 10.25 13 5 (14.5,21.7] 瑟堡 9 10 19 168 75 93 52.63 10.75 12 11.31 14 5 (14.5,21.7] 昆斯敦 5. 3. 8. 77 47 30 37.5 10 10.64 10.39 15 5 (14.5,21.7] 南安普顿 37 24 61 644 427 217 39.34 11.06 8.67 9.47 16 6 (21.7,27] 瑟堡 1. 4. 5. 168 75 93 80 4.3 1.33 2.98 17 6 (21.7,27] 昆斯敦 2. 3. 5. 77 47 30 60 10 4.26 6.49 18 6 (21.7,27] 南安普顿 40 39 79 644 427 217 49.37 17.97 9.37 12.27 19 7 (27.0,39.7] 瑟堡 14 10 24 168 75 93 41.67 10.75 18.67 14.29 20 7 (27.0,39.7] 昆斯敦 5. 楠 5. 77 47 30 楠 楠 10.64 6.49 21 7 (27.0,39.7] 南安普顿 38 24 62 644 427 217 38.71 11.06 8.9 9.63 22 8 (39.7,78] 瑟堡 5. 19 24 168 75 93 79.17 20.43 6.67 14.29 23 8 (39.7,78] 南安普顿 37 28 65 644 427 217 43.08 12.9 8.67 10.09 24 9 (78.0,512.3] 瑟堡 11 33 44 168 75 93 75 35.48 14.67 26.19 25 9 (78.0,512.3] 昆斯敦 1. 1. 2. 77 47 30 50 3.33 2.13 2.6 26 9 (78.0,512.3] 南安普顿 9 30 39 644 427 217 76.92 13.82 2.11 6.06 27 2 (7.9,8] 昆斯敦 楠 5. 5. 77 47 30 100 16.67 楠 6.49 28 9 (78.0,512.3] 丢失的 楠 2. 2. 2. 楠 2. 100 100 楠 100
  • 正在绘制
    dfl2.T
    ,但是
    的“生存率%”
    结果中。因此,
    dfl2.T
    中的值的索引与
    的“生存率%”
    不对应
  • 因为
    结果['%total dist by xdim']
    中的所有值都是 不是唯一的,我们不能使用匹配的
    键值的
    dict
  • 'survival rate%'
    创建相应的数据透视框,然后将其展平。所有值的顺序都将与
    '%total dist by xdim'
    中来自
    dfl2.T
    的值相同。因此,它们可以被索引
  • 对于
    dfl2.T
    ,plot API按列顺序打印,这意味着必须使用
    展平(order='F')
    以正确的顺序展平数组以进行索引
#为生存率%
dfl3=pd.melt(结果,id_vars=[ydim,xdim],value_vars=['survival rate%],var_name='Type',value_name=value_name1)。drop(columns='Type')
dfl4=dfl3.pivot(索引=ydim,列=xdim,值=value\u name 1)
#按列顺序展平dfl4.T
dfl4_flatten=dfl4.T.to_numpy().flatten(order='F')
对于枚举中的i,p(ax.patches):
宽度,高度=p.获取宽度(),p.获取高度()
x、 y=p.获得_xy()
#仅在高度不为0时打印值
如果高度!=0:
#创建文本字符串
text=f'{height:.0f}%,{dfl4_[i]:.0f}%'
#注释条段
ax.text(x+width/2,y+height/2,text,水平对齐='center',垂直对齐='center')

笔记
  • 这里我们可以看到
    dfl2.T
    dfl4.T
#dfl2.T
票价0.1,7.6]1(7.6,7.9]2(7.9,8.0]3(8.0,10.5]4(10.5,14.5]5(14.5,21.7]6(21.7,27.0]7(39.7,78.0]9(78.0512.3)
登船进城
瑟堡17.26南2.98 1.79 8.93 11.31 2.98 14.29 14.29 26.19
失踪楠楠100.00
皇后镇5.19 55.84 6.49 2.60 3.90 10.39 6.49 6.49南2.60
南安普敦9.16 6.83 14.91 11.34 10.25 9.47 12.27 9.63 10.09 6.06
#dfl4.T
票价0.1,7.6]1(7.6,7.9]2(7.9,8.0]3(8.0,10.5]4(10.5,14.5]5(14.5,21.7]6(21.7,27.0]7(39.7,78.0]9(78.0512.3)
登船进城
瑟堡24.14南20.00 33.33