Python:Pandas-多页眉excel工作表中的整洁数据框
我有一个软件工具的excel表格输出,该软件工具的结构如下所示。 excel结构:Python:Pandas-多页眉excel工作表中的整洁数据框,python,excel,pandas,dataframe,import,Python,Excel,Pandas,Dataframe,Import,我有一个软件工具的excel表格输出,该软件工具的结构如下所示。 excel结构: +---+-------+--------------+--------------+ | | | | | +---+-------+--------------+--------------+ | | | not relevant | not relevant | +---+-------+--------------+----
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | not relevant |
+---+-------+--------------+--------------+
| | | X1 | Y1 |
+---+-------+--------------+--------------+
|fr | Time | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12 | 32 |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23 | 3 |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45 | 4 |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4 | 1 |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | Y2 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3 | |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | X3 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23 | |
+---+-------+--------------+--------------+
,,,
,,not relevant,not relevant
,,X1,Y1
fr,Time,not relevant,not relevant
1,0.000,12,32
2,0.010,23,3
3,0.020,45,4
4,0.030,4,1
,,,
,,not relevant,
,,Y2,
fr,Time,not relevant,
1,0.000,5,
2,0.010,89,
3,0.020,5,
4,0.030,3,
,,,
,,not relevant,
,,X3,
fr,Time,not relevant,
1,0.000,17,
2,0.010,2,
3,0.020,4,
4,0.030,23,
csv结构:
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | not relevant |
+---+-------+--------------+--------------+
| | | X1 | Y1 |
+---+-------+--------------+--------------+
|fr | Time | not relevant | not relevant |
+---+-------+--------------+--------------+
| 1 | 0.000 | 12 | 32 |
+---+-------+--------------+--------------+
| 2 | 0.010 | 23 | 3 |
+---+-------+--------------+--------------+
| 3 | 0.020 | 45 | 4 |
+---+-------+--------------+--------------+
| 4 | 0.030 | 4 | 1 |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | Y2 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 5 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 89 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 5 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 3 | |
+---+-------+--------------+--------------+
| | | | |
+---+-------+--------------+--------------+
| | | not relevant | |
+---+-------+--------------+--------------+
| | | X3 | |
+---+-------+--------------+--------------+
|fr | Time | not relevant | |
+---+-------+--------------+--------------+
| 1 | 0.000 | 17 | |
+---+-------+--------------+--------------+
| 2 | 0.010 | 2 | |
+---+-------+--------------+--------------+
| 3 | 0.020 | 4 | |
+---+-------+--------------+--------------+
| 4 | 0.030 | 23 | |
+---+-------+--------------+--------------+
,,,
,,not relevant,not relevant
,,X1,Y1
fr,Time,not relevant,not relevant
1,0.000,12,32
2,0.010,23,3
3,0.020,45,4
4,0.030,4,1
,,,
,,not relevant,
,,Y2,
fr,Time,not relevant,
1,0.000,5,
2,0.010,89,
3,0.020,5,
4,0.030,3,
,,,
,,not relevant,
,,X3,
fr,Time,not relevant,
1,0.000,17,
2,0.010,2,
3,0.020,4,
4,0.030,23,
我正在寻找一种快速的方法将这些杂乱的数据转换成整洁的数据帧
- 每个子系列的时间戳的值和编号相同
- 子系列的数量是可变的
Time X1 Y1 Y2 X3
0.000 12 32 5 17
0.010 23 3 89 2
0.020 45 4 5 4
0.030 4 1 3 23
我做了以下的。。。虽然不是很开心,但很管用
import numpy as np
import pandas as pd
filename = 'test_data'
df = pd.read_excel(filename + '.xlsx', header=None)
df_list = np.split(df, df[df.isnull().all(1)].index)
del df_list[0]
for i, df in enumerate(df_list):
df.iloc[3, 2:] = df.iloc[2, 2:]
new_header = df.iloc[3]
df.columns = new_header
df = df.iloc[4:]
df_tmp = df.drop(['Frame'], axis=1)
df = df_tmp.set_index("Time")
df.dropna(axis=1, how='all', inplace=True)
df.columns.name = None
df_list[i] = df
df = pd.concat(df_list, axis=1)
df = df.reindex(sorted(df.columns), axis=1)
df.to_csv(filename + '.csv')
在
pd.read\u excel
方法中查找skiprows
参数。您将能够轻松获得所需的输出。@MayankPorwal我知道skiprows
,很容易用于跳过顶部的行,但这里的挑战是excel数据中连接了多个子系列。我可能只是用.split