Python 如何直接从groupby创建数据帧_Python_Pandas

Python 如何直接从groupby创建数据帧

python pandas

Python 如何直接从groupby创建数据帧,python,pandas,Python,Pandas,我的代码运行良好。但是我认为有一种更有效的编码方法。但我想不出来。我认为reset_index（）工作得很好，但在这种情况下不行。因此，欢迎所有建议。提前谢谢我有一个大的数据框（医院数据）。所有数据均来自2017年、2018年和2019年。列：SpoEdlectief可以有两个值：一个用于紧急情况，一个用于非紧急情况。在荷兰，紧急情况被称为Spoed。所以，紧急情况是S，非紧急情况是E 从数据框中，我想创建一个新的数据框（可视化每年的紧急和非紧急数量）。但我还是要坚持下去。一些代码 te

我的代码运行良好。但是我认为有一种更有效的编码方法。但我想不出来。我认为reset_index（）工作得很好，但在这种情况下不行。因此，欢迎所有建议。提前谢谢

我有一个大的数据框（医院数据）。所有数据均来自2017年、2018年和2019年。列：SpoEdlectief可以有两个值：一个用于紧急情况，一个用于非紧急情况。在荷兰，紧急情况被称为Spoed。所以，紧急情况是S，非紧急情况是E

从数据框中，我想创建一个新的数据框（可视化每年的紧急和非紧急数量）。但我还是要坚持下去。一些代码

test = df_new.groupby(df_new['operatiejaar'])['spoedelectief'].value_counts().sort_index()

返回熊猫系列：

operatiejaar  spoedelectief
2017          E                5459
              S                1054
2018          E                6191
              S                1029
2019          E                6160
              S                1159

为了在Seaborn中进行可视化，我尝试使用reset_index（）将其设置为数据帧，但出现了一个错误：

ValueError: cannot insert spoedelectief, already exists

将测试设置为数据帧工作：

test = pd.DataFrame(test)

因此：

但是test.columns给出了以下信息：

Index(['spoedelectief'], dtype='object')

在我用来创建所需数据帧的代码下面：

test = df_new.groupby(df_new['operatiejaar'])['spoedelectief'].value_counts().sort_index()

jaar_list = []
spel_list = []
totaal = []
for index, value in test.items():
    jaar_list.append(index[0])
    spel_list.append(index[1])
    totaal.append(value)

spel_jaar = pd.DataFrame(
    {'jaar': jaar_list,
     'spoedelectief': spel_list,
     'totaal': totaal
    })

Wich给出了所需的DF：

如何更容易/直接从原始DF编码？谢谢

您需要

重命名系列，然后才能：
或者在以下内容中使用名称
：
需要考虑的另外两个选择：
:

将结果重新格式化为多个列，每个列对应value\u counts
找到的名称：
也可以避免命名系列，而是将其展开为两列，以便更好地打印：
# 'E' and 'S' counts become two columns
test2 = (
    df_new.groupby('operatiejaar')['spoedelectief']
    .value_counts().unstack()
)
test2.plot.bar()

示例（关于随机生成的小数据）：


注释：

您可以不使用df\u new[column\u name]
作为groupby
的参数，只需指定column\u name
您不必对索引（）进行排序（至少在熊猫的最新版本中是这样）：默认情况下，groupby（）
和value\u counts（）
都进行排序
无需在groupby
的参数中重复df\u new
，也无需排序索引。谢谢！伟大而有益的回答。干杯，简
test = (df_new.groupby(df_new['operatiejaar'])['spoedelectief']
              .value_counts()
              .sort_index()
              .reset_index(name='count'))

test = (
    df_new.groupby('operatiejaar')['spoedelectief']
    .value_counts().to_frame('totaal').reset_index()
)

# 'E' and 'S' counts become two columns
test2 = (
    df_new.groupby('operatiejaar')['spoedelectief']
    .value_counts().unstack()
)
test2.plot.bar()