Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 确定按'分组的事件数;年份';-继续的_Python_Pandas_Dataframe_Group By_Loc - Fatal编程技术网

Python 确定按'分组的事件数;年份';-继续的

Python 确定按'分组的事件数;年份';-继续的,python,pandas,dataframe,group-by,loc,Python,Pandas,Dataframe,Group By,Loc,这是一篇文章的延续 现在,我想按年份对这些事件进行分组,以便输出类似于: Combo Occurrence (2017) Occurrence (2018) Occurrence (2019) Occurrence (2020) 0 DK,NO4 2 x x x 1 DK1,NO1 1

这是一篇文章的延续

现在,我想按年份对这些事件进行分组,以便输出类似于:

        Combo  Occurrence (2017)    Occurrence (2018)    Occurrence (2019)   Occurrence (2020)
0      DK,NO4           2                   x                   x                   x
1     DK1,NO1           1                   x                   x                   x
2  DK,NO1,NO2           1                   x                   x                   x
3       DK,NO           1                   x                   x                   x
输入数据如下所示:

      Year  Month  Day  Weekday   NO1  ...    SE3    SE4     FI    DK1    DK2
0      2017      1    1        7  28.4  ...  24.03  24.03  24.03  20.96  20.96
1      2017      1    1        7  28.2  ...  25.05  25.05  25.05  25.05  25.05
2      2017      1    1        7  28.0  ...  25.05  25.05  25.05  25.05  25.05
3      2017      1    1        7  28.0  ...  23.19  23.19  23.19  16.03  16.03
4      2017      1    1        7  28.0  ...  24.10  24.10  24.10  16.43  16.43
...     ...    ...  ...      ...   ...  ...    ...    ...    ...    ...    ...
35063  2020     12   31        4  31.0  ...  31.00  58.04  35.32  89.35  89.35
35064  2020     12   31        4  24.8  ...  24.84  54.45  24.84  56.70  56.70
35065  2020     12   31        4  24.8  ...  24.77  51.18  28.00  52.44  52.44
35066  2020     12   31        4  24.6  ...  24.61  45.84  26.55  51.86  51.86
35067  2020     12   31        4  24.1  ...  24.07  24.07  24.07  78.66  78.66

Idea是将没有处理字符串的所有列转换为
索引
,然后用于每年的计数值:

df = df.set_index(['Year','Month','Day','Weekday'])

df = (df.eq(df['DK1 Up'], axis=0)
        .dot(df.columns + ',')
        .str[:-1]
        .to_frame('Combo')
        .groupby('Year')['Combo']
        .value_counts()
        .unstack(0, fill_value=0)
        .add_prefix('Occurrence ')
        .rename_axis(columns=None)
        .reset_index()
        )
print (df)
                             Combo  Occurrence 2017  Occurrence 2020
0  DK1 Up,DK1 Down,DK2 Up,DK2 Down                2                0
1                    DK1 Up,DK2 Up                3                5

输入数据的外观如何?完成。谢谢你的耐心。你能帮我排序吗?这样,总次数最多的“组合”列在顶部,总次数减少?@JeppeBay-那么需要按年份对每列进行排序吗?这是一个问题,因为所有值的顺序都发生了变化,比如
print(df.sort\u values('Occurrence 2017'))
print(df.sort\u values('Occurrence 2019'))
我想我应该将行相加到一个单独的列中,让我们称之为“sum”,然后进行排序this@JeppeBay-因此使用
df['sum']=df.sum(轴=1)
然后
df.sort_values('sum')
@JeppeBay-所以不可能使用
df['Combo']=df['Combo']。在这里替换(d,regex=True)