Python 使用pandas和pyplot在多个列上分组,获取值计数,并打印此信息

Python 使用pandas和pyplot在多个列上分组,获取值计数,并打印此信息,python,pandas,dataframe,matplotlib,Python,Pandas,Dataframe,Matplotlib,我正在分析一个基于代理的模型运行的一些数据,该模型(TL;DR)模拟物种的生命周期,以预测给定特定输入参数的存活率。我正在为如何使用熊猫和pyplot来实现这一目标而苦苦挣扎,希望得到一些建议。我有一个csv,看起来像这样 "run","day","Lifestate","Lat","Long","habitat_sample" 1, 1.0,"adult",0.0,0.0,0 1, 1.0,"adult",0.0,0.0,0 1, 1.0,"larva",0.0,0.0,0 1, 2.0,"a

我正在分析一个基于代理的模型运行的一些数据,该模型(TL;DR)模拟物种的生命周期,以预测给定特定输入参数的存活率。我正在为如何使用熊猫和pyplot来实现这一目标而苦苦挣扎,希望得到一些建议。我有一个csv,看起来像这样

"run","day","Lifestate","Lat","Long","habitat_sample"
1, 1.0,"adult",0.0,0.0,0
1, 1.0,"adult",0.0,0.0,0
1, 1.0,"larva",0.0,0.0,0
1, 2.0,"adult",0.0,0.0,0
1, 2.0,"nymph",0.0,0.0,0
1, 2.0,"nymph",0.0,0.0,0
1, 2.0,"nymph",0.0,0.0,0
1, 3.0,"nymph",0.0,0.0,0
1, 3.0,"nymph",0.0,0.0,0
1, 3.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
1, 4.0,"nymph",0.0,0.0,0
2, 1.0,"adult",0.0,0.0,0
2, 1.0,"adult",0.0,0.0,0
2, 1.0,"adult",0.0,0.0,0
3, 1.0,"nymph",0.0,0.0,0
3, 1.0,"nymph",0.0,0.0,0
3, 2.0,"larva",0.0,0.0,0
3, 2.0,"larva",0.0,0.0,0
我需要做的是绘制每次跑步不同生命期的存活率。换句话说,对于每次跑步,我需要绘制每天出现的成虫、幼虫和若虫的数量。因此在第1天,有3个成虫,1个若虫,2个幼虫。第二天有2个成虫,2个若虫,6个幼虫,等等。我想以这样的方式结束(为这张废话草图道歉):

我对熊猫还很陌生,我很难理解我所掌握的所有不同的技术。我不知道如何根据每天成虫/若虫/幼虫的数量来分解和绘制“生命状态”列。我尝试过按运行/勾选进行分组,并获取生命状态列的
value\u counts()
,尝试过按仅运行进行分组,并提取每个生命阶段的个体数,等等。我可以获取我想要的数字,但我无法以可以绘制它们的方式获取它们。绘制天数和价值计数是没有意义的,因为它们最终是不同的维度,对吗?我觉得我的迭代方法效率低下,直觉告诉我这不是正确的方法。我尝试过的许多事情中的一个例子

grouped = data.groupby(['run','tick'])

for name, group in grouped_data:
    valcounts = group['Lifestate'].value_counts()
这确实让我得到了我需要的数字,但我不确定如何绘制它们。另一个问题是,一旦我开始使用我的实际(大型)数据集,像这样的循环是否会很慢

我目前的想法是尝试提取我想要的数据,并为每次跑步创建一个新的数据框。我想每次跑步我都想要这样的东西

"day","num_adults","num_nymphs", "num_larva"
1, 2, 4, 6
2, 1, 3, 5
3, 1, 3, 5
4, 1, 2, 4

等等。这听起来是解决这个问题的正确方法吗?我错过了/没有想到什么?如能提供逻辑或设计方面的建议,将不胜感激。谢谢。

我不确定您想对示例中的“运行”做什么。如果您需要单独考虑每个运行,这里是我的看法:

mix = pd.MultiIndex.from_product([df['run'].unique(), df['day'].unique(), df['Lifestate'].unique()], names=['run','day','Lifestate'])
new = df.groupby(['run','day','Lifestate']).size().reindex(mix, fill_value=0).unstack().reset_index()
新的数据帧
new
如下所示:

Lifestate  run  day  adult  larva  nymph
0            1  1.0      2      1      0
1            1  2.0      1      0      3
2            1  3.0      0      0      3
3            1  4.0      0      0      4
4            2  1.0      3      0      0
那么,单独绘制每次运行就非常简单了:

# create one subplot by "run"
runs = new.groupby('run')
fig, axs = plt.subplots(len(runs), 1, sharex=True, sharey=True, constrained_layout=True)
for ax,(g,temp) in zip(axs,runs):
    temp.plot(x='day', y=['nymph','larva','adult'], ax=ax, legend=ax.is_first_row())
    ax.set_title("run #{:d}".format(g))