Python 使用pandas读取带有稀疏标记列标题的CSV
我有一个.csv文件,我正试图读入一个pandas数据框,该数据框有多行列标题,但第一行的标签很少 例: 当我读csv的时候Python 使用pandas读取带有稀疏标记列标题的CSV,python,pandas,Python,Pandas,我有一个.csv文件,我正试图读入一个pandas数据框,该数据框有多行列标题,但第一行的标签很少 例: 当我读csv的时候 Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3]) print(Cosp2) 最后,在所有未显式标记的头的第一级头上都有未命名的:#_level_0标签 RH=0.8 Unnamed: 2_level_0 Unnamed: 3_level_0 \ nat_freq avrg_sp(T
Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3])
print(Cosp2)
最后,在所有未显式标记的头的第一级头上都有未命名的:#_level_0标签
RH=0.8 Unnamed: 2_level_0 Unnamed: 3_level_0 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o)
0.00061 0.0840 0.117551 0.117550
0.00122 0.0749 0.126468 0.126466
0.00183 0.0754 0.124370 0.124367
0.00244 0.0776 0.136591 0.136587
0.00305 0.0873 0.141423 0.141418
0.00366 0.0729 0.143599 0.143593
Unnamed: 4_level_0 RH=0.9 Unnamed: 6_level_0 \
nat_freq pred_sp(h2o) avrg_sp(T) avrg_sp(h2o)
0.00061 0.0864 0.128697 0.163304
0.00122 0.0770 0.090500 0.200350
0.00183 0.0776 0.085400 0.121275
0.00244 0.0799 0.054500 0.100996
0.00305 0.0898 0.075700 0.170033
0.00366 0.0750 0.100018 0.165468
Unnamed: 7_level_0 Unnamed: 8_level_0
nat_freq denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.163304 0.127553
0.00122 0.200350 0.089700
0.00183 0.121274 0.084600
0.00244 0.100994 0.054000
0.00305 0.170032 0.075000
0.00366 0.165466 0.099100
有没有办法让熊猫在未标记的列中传播0级标签?我想要这样的东西:
RH=0.8 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.0840 0.117551 0.117550 0.0864
0.00122 0.0749 0.126468 0.126466 0.0770
0.00183 0.0754 0.124370 0.124367 0.0776
0.00244 0.0776 0.136591 0.136587 0.0799
0.00305 0.0873 0.141423 0.141418 0.0898
0.00366 0.0729 0.143599 0.143593 0.0750
RH=0.9
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.128697 0.163304 0.163304 0.127553
0.00122 0.090500 0.200350 0.200350 0.089700
0.00183 0.085400 0.121275 0.121274 0.084600
0.00244 0.054500 0.100996 0.100994 0.054000
0.00305 0.075700 0.170033 0.170032 0.075000
0.00366 0.100018 0.165468 0.165466 0.099100
您可以先使用for系列
:
a = Cosp2.columns.get_level_values(0).to_series()
print (a)
RH=0.8 RH=0.8
Unnamed: 2_level_0 Unnamed: 2_level_0
Unnamed: 3_level_0 Unnamed: 3_level_0
Unnamed: 4_level_0 Unnamed: 4_level_0
RH=0.9 RH=0.9
Unnamed: 6_level_0 Unnamed: 6_level_0
Unnamed: 7_level_0 Unnamed: 7_level_0
Unnamed: 8_level_0 Unnamed: 8_level_0
dtype: object
如果未命名
则使用NaN
s,并将NaN
替换为ffill
(使用method='ffill'
)
上次创建新的多索引
的方法是:
a = Cosp2.columns.get_level_values(0).to_series()
print (a)
RH=0.8 RH=0.8
Unnamed: 2_level_0 Unnamed: 2_level_0
Unnamed: 3_level_0 Unnamed: 3_level_0
Unnamed: 4_level_0 Unnamed: 4_level_0
RH=0.9 RH=0.9
Unnamed: 6_level_0 Unnamed: 6_level_0
Unnamed: 7_level_0 Unnamed: 7_level_0
Unnamed: 8_level_0 Unnamed: 8_level_0
dtype: object
b = a.mask(a.str.startswith('Unnamed')).ffill()
print (b)
RH=0.8 RH=0.8
Unnamed: 2_level_0 RH=0.8
Unnamed: 3_level_0 RH=0.8
Unnamed: 4_level_0 RH=0.8
RH=0.9 RH=0.9
Unnamed: 6_level_0 RH=0.9
Unnamed: 7_level_0 RH=0.9
Unnamed: 8_level_0 RH=0.9
dtype: object
Cosp2.columns = pd.MultiIndex.from_arrays([b, Cosp2.columns.get_level_values(1)])
print (Cosp2)
RH=0.8 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.0840 0.117551 0.117550 0.0864
0.00122 0.0749 0.126468 0.126466 0.0770
0.00183 0.0754 0.124370 0.124367 0.0776
0.00244 0.0776 0.136591 0.136587 0.0799
0.00305 0.0873 0.141423 0.141418 0.0898
0.00366 0.0729 0.143599 0.143593 0.0750
RH=0.9
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.128697 0.163304 0.163304 0.127553
0.00122 0.090500 0.200350 0.200350 0.089700
0.00183 0.085400 0.121275 0.121274 0.084600
0.00244 0.054500 0.100996 0.100994 0.054000
0.00305 0.075700 0.170033 0.170032 0.075000
0.00366 0.100018 0.165468 0.165466 0.099100