Python 如何将序列合并为数据帧的多索引的一列的成员_Python_Dataframe_Merge_Series_Multi Index

Python 如何将序列合并为数据帧的多索引的一列的成员

python dataframe merge

Python 如何将序列合并为数据帧的多索引的一列的成员,python,dataframe,merge,series,multi-index,Python,Dataframe,Merge,Series,Multi Index,我有一个多索引的数据帧，包括（阶段、服务组、车站、年份、期间），其目的是在指定多索引的所有5个值时返回“所需容量”。例如，在最终阶段，米尔顿车站西区服务组，2025年，高峰期1，所需容量为1500 目前有7个可能的时段，其中两个是“非高峰时段”和“肩峰时段” 我需要为多指标的每个实例添加一个新的时段，称为非高峰路肩，其中新值定义为非高峰小时和路肩小时的平均值到目前为止，我有以下代码： import pandas as pd import os directory = '/Users/mar

我有一个多索引的数据帧，包括（阶段、服务组、车站、年份、期间），其目的是在指定多索引的所有5个值时返回“所需容量”。例如，在最终阶段，米尔顿车站西区服务组，2025年，高峰期1，所需容量为1500

目前有7个可能的时段，其中两个是“非高峰时段”和“肩峰时段”

我需要为多指标的每个实例添加一个新的时段，称为非高峰路肩，其中新值定义为非高峰小时和路肩小时的平均值

到目前为止，我有以下代码：

import pandas as pd
import os

directory = '/Users/mark/PycharmProjects/psrpcl_data'
capacity_required_file = 'Capacity_Requirements.csv'
capacity_required_path = os.path.join(directory, capacity_required_file)

df_capacity_required = pd.read_csv(capacity_required_path, sep=',',
                       usecols=['phase', 'service_group', 'station', 'year', 'period', 'capacity_required'])

df_capacity_required.set_index(['phase', 'service_group', 'station', 'year'], inplace=True)
df_capacity_required.sort_index(inplace=True)

print(df_capacity_required.head(14))

上述代码的输出为：

                                                               period  capacity_required
phase service_group station                      year
Early Barrie        Allandale Waterfront Station 2025  AM Peak Period                490
                                                 2025   Off-Peak Hour                100
                                                 2025  PM Peak Period                520
                                                 2025     Peak Hour 2                250
                                                 2025     Peak Hour 5                180
                                                 2025     Peak Hour 6                180
                                                 2025   Shoulder Hour                250
                                                 2026  AM Peak Period                520
                                                 2026   Off-Peak Hour                50
                                                 2026  PM Peak Period                520
                                                 2026     Peak Hour 2                260
                                                 2026     Peak Hour 5                180
                                                 2026     Peak Hour 6                180
                                                 2026   Shoulder Hour                250

以上仅为约30K条线路中的前14条线路。这显示了两年的周期。请注意，每年有7个时段

我试图创建一个称为“非高峰路肩”的新“时段”，将其添加到每个单独（阶段、服务组、车站、年份）组合中，即非高峰和路肩的平均值

下一行正确计算每个索引值的一个非峰值路肩值：

off_peak_shoulder = df_capacity_required.loc[df_capacity_required.period == 'Off-Peak Hour', 'capacity_required'].add(
                    df_capacity_required.loc[df_capacity_required.period == 'Shoulder', 'capacity_required']).div(2)

print(off_peak_shoulder)

上述代码提供以下（正确的）非峰值路肩系列作为输出：

phase    service_group          station                       year
Early    Barrie                 Allandale Waterfront Station  2025      0.0
                                                              2026      0.0
                                                              2027      0.0
                                                              2028      0.0
                                                              2029      0.0
                                                                      ...
Initial  Union Pearson Express  Pearson Station               2023    160.0
                                                              2024    160.0
                                Weston Station                2022     80.0
                                                              2023    105.0
                                                              2024    105.0

问题：如何将非高峰路肩系列合并/加入到df\u容量中，以使非高峰路肩成为“时段”下的一个条目，如下所示

                                                               period  capacity_required
phase service_group station                      year
Early Barrie        Allandale Waterfront Station 2025    AM Peak Period                490
                                                 2025     Off-Peak Hour                100
                                                 2025    PM Peak Period                520
                                                 2025       Peak Hour 2                250
                                                 2025       Peak Hour 5                180
                                                 2025       Peak Hour 6                180
                                                 2025     Shoulder Hour                250
                                                 2025 Off-Peak Shoulder                175
                                                 2026    AM Peak Period                520
                                                 2026     Off-Peak Hour                50
                                                 2026    PM Peak Period                520
                                                 2026       Peak Hour 2                260
                                                 2026       Peak Hour 5                180
                                                 2026       Peak Hour 6                180
                                                 2026     Shoulder Hour                250
                                                 2025 Off-Peak Shoulder                150

我在这个问题上睡着了，醒来时发现了一个解决办法。我已经有了我需要的值列表，并且为每个值设置了正确的多索引。我想我需要一些复杂的多索引插入代码，但实际上我只需要将创建的数据帧以与原始数据帧相同的形式放在一起，并将两者合并在一起

这是我添加的代码。注意，第一行与原始代码相同，只是我添加了一个重置索引的调用

    df_new = df_capacity_required.loc[df_capacity_required.period == 'Off-Peak Hour', 'capacity_required'].add(
        df_capacity_required.loc[df_capacity_required.period == 'Shoulder Hour', 'capacity_required']).div(2).reset_index()
    df_new['period'] = 'Off-Peak Shoulder'
    df_new.set_index(['phase', 'service_group', 'station', 'year'], inplace=True)
 
    df_capacity_required = concat([df_capacity_required, df_new])
    df_capacity_required.sort_index(inplace=True)

    print_full(df_capacity_required.head(16))

该print语句提供以下所需输出：

                                                               period  capacity_required
phase service_group station                      year
Early Barrie        Allandale Waterfront Station 2025    AM Peak Period                490
                                                 2025     Off-Peak Hour                100
                                                 2025    PM Peak Period                520
                                                 2025       Peak Hour 2                250
                                                 2025       Peak Hour 5                180
                                                 2025       Peak Hour 6                180
                                                 2025     Shoulder Hour                250
                                                 2025 Off-Peak Shoulder                175
                                                 2026    AM Peak Period                520
                                                 2026     Off-Peak Hour                50
                                                 2026    PM Peak Period                520
                                                 2026       Peak Hour 2                260
                                                 2026       Peak Hour 5                180
                                                 2026       Peak Hour 6                180
                                                 2026     Shoulder Hour                250
                                                 2026 Off-Peak Shoulder                150

但是感谢所有读过这个问题的人。很高兴知道StackOverflow上有人愿意帮助被卡住的人