Python 基于列上指定的级别转换数据帧
我想转换这个Python 基于列上指定的级别转换数据帧,python,pandas,dataframe,transform,multi-index,Python,Pandas,Dataframe,Transform,Multi Index,我想转换这个DataFrame,它有一列Level,指定行的层次结构级别 df_1 = pd.DataFrame(np.array([["Steel: Furnaces, Refining and Rolling", 1, 155.568345, 152.042158, 152.587873], ["Steel: Furnaces, Refining and Rolling - Thermal", 2, 99
DataFrame
,它有一列Level
,指定行的层次结构级别
df_1 = pd.DataFrame(np.array([["Steel: Furnaces, Refining and Rolling", 1, 155.568345, 152.042158, 152.587873],
["Steel: Furnaces, Refining and Rolling - Thermal", 2, 99.841607, 97.472990, 97.822843],
["LPG", 3, 0.300934, 0.000000, 0.000000],
["Diesel oil (incl. biofuels)", 3, 0.000000, 0.000000, 0.000000],
["Residual fuel oil", 3, 32.204198, 31.535245, 31.648432],
["Natural gas (incl. biogas)", 3, 67.336475, 65.937745, 66.174411],
["Steel: Furnaces, Refining and Rolling - Electric", 2, 55.726738, 54.569168, 54.765030]]),
columns=['AT: Iron and steel / useful energy demand', 'level', '2000', '2001', '2002'])
df_1
+--------------------------------------------------+-------+------------+------------+------------+
| AT: Iron and steel / useful energy demand | level | 2000 | 2001 | 2002 |
+--------------------------------------------------+-------+------------+------------+------------+
| Steel: Furnaces, Refining and Rolling | 1 | 155.568345 | 152.042158 | 152.587873 |
| Steel: Furnaces, Refining and Rolling - Thermal | 2 | 99.841607 | 97.47299 | 97.822843 |
| LPG | 3 | 0.300934 | 0.0 | 0.0 |
| Diesel oil (incl. biofuels) | 3 | 0.0 | 0.0 | 0.0 |
| Residual fuel oil | 3 | 32.204198 | 31.535245 | 31.648432 |
| Natural gas (incl. biogas) | 3 | 67.336475 | 65.937745 | 66.174411 |
| Steel: Furnaces, Refining and Rolling - Electric | 2 | 55.726738 | 54.569168 | 54.76503 |
+--------------------------------------------------+-------+------------+------------+------------+
像这样的事情。一级是过程,二级是能量,三级是燃料
df_2 = pd.DataFrame({'Process': ["Steel: Furnaces, Refining and Rolling", "Steel: Furnaces, Refining and Rolling", "Steel: Furnaces, Refining and Rolling", "Steel: Furnaces, Refining and Rolling", "Steel: Furnaces, Refining and Rolling", "Steel: Furnaces, Refining and Rolling", "Steel: Furnaces, Refining and Rolling"],
'Energy': [None, 'Steel: Furnaces, Refining and Rolling - Thermal', 'Steel: Furnaces, Refining and Rolling - Thermal', 'Steel: Furnaces, Refining and Rolling - Thermal', 'Steel: Furnaces, Refining and Rolling - Thermal', 'Steel: Furnaces, Refining and Rolling - Thermal', 'Steel: Furnaces, Refining and Rolling - Electric'],
'Fuel': [None, None, 'LPG', 'Diesel oil (incl. biofuels)', 'Residual fuel oil', 'Natural gas (incl. biogas)', None],
'2000': [155.5683448, 99.84160689, 0.300933684, 0, 32.20419829, 67.33647492, 55.72673787],
'2001': [152.0421582, 97.47298987, 0, 0, 31.53524476, 65.93774511, 54.56916837],
'2002': [152.5878732, 97.82284329, 0, 0, 31.64843215, 66.17441114, 54.76502991]})
df_2
+---------------------------------------+--------------------------------------------------+-----------------------------+------------+------------+------------+
| Process | Energy | Fuel | 2000 | 2001 | 2002 |
+---------------------------------------+--------------------------------------------------+-----------------------------+------------+------------+------------+
| Steel: Furnaces, Refining and Rolling | None | None | 155.568345 | 152.042158 | 152.587873 |
| Steel: Furnaces, Refining and Rolling | Steel: Furnaces, Refining and Rolling - Thermal | None | 99.841607 | 97.472990 | 97.822843 |
| Steel: Furnaces, Refining and Rolling | Steel: Furnaces, Refining and Rolling - Thermal | LPG | 0.300934 | 0.000000 | 0.000000 |
| Steel: Furnaces, Refining and Rolling | Steel: Furnaces, Refining and Rolling - Thermal | Diesel oil (incl. biofuels) | 0.000000 | 0.000000 | 0.000000 |
| Steel: Furnaces, Refining and Rolling | Steel: Furnaces, Refining and Rolling - Thermal | Residual fuel oil | 32.204198 | 31.535245 | 31.648432 |
| Steel: Furnaces, Refining and Rolling | Steel: Furnaces, Refining and Rolling - Thermal | Natural gas (incl. biogas) | 67.336475 | 65.937745 | 66.174411 |
| Steel: Furnaces, Refining and Rolling | Steel: Furnaces, Refining and Rolling - Electric | None | 55.726738 | 54.569168 | 54.765030 |
+---------------------------------------+--------------------------------------------------+-----------------------------+------------+------------+------------+
我如何才能做到这一点?使用-
levs = df_1.loc[df_1['level']!='1', 'level'].unique()
ind = df_1['AT: Iron and steel / useful energy demand'].where(df_1['level']=='1').ffill().values
for j, i in enumerate(levs):
lev = df_1['AT: Iron and steel / useful energy demand'].where(df_1['level']==i)
if j != (len(levs)-1):
lev = lev.ffill()
ind = np.vstack((ind, lev))
ind = pd.MultiIndex.from_tuples(list(zip(*ind)))
tmp = df_1.set_index(ind).reset_index()
tmp = tmp.rename(columns={'level_0': 'Process', 'level_1':'Energy', 'level_2':'Fuel'}).drop(['level'], axis=1)
tmp
这样做:
# unstack the column "level"
df_3 = df_1.set_index(['level', '2000', '2001', '2002'], append=True).unstack('level')
df_3.columns = df_3.columns.droplevel(0)
# forward fill (propagate last valid observation forward)
df_3[['1', '2']] = df_3[['1', '2']].ffill()
df_3 = df_3.reset_index().drop(['level_0'], axis=1)
df_3 = df_3.rename(columns={'1': 'Process', '2':'Energy', '3':'Fuel'})
column_names = ['Process', 'Energy', 'Fuel'] + ['2000', '2001', '2002']
df_3 = df_3.reindex(columns=column_names)
df_3
df_1.设置索引(['level'、'2000'、'2001'、'2002'],append=True)。取消堆栈(“level”)
获取所需格式,然后ffill
逐列显示您的条件。