Python 使用pandas read#u csv读取标题时跳过`#`字符
我有一个文件如下所示:Python 使用pandas read#u csv读取标题时跳过`#`字符,python,pandas,Python,Pandas,我有一个文件如下所示: # Time Cm Cd Cl Cl(f) Cl(r) Cm Cd Cl Cl(f) Cl(r) 1.000000000000e+01 -5.743573465913e-01 -5.86016
# Time Cm Cd Cl Cl(f) Cl(r) Cm Cd Cl Cl(f) Cl(r)
1.000000000000e+01 -5.743573465913e-01 -5.860160539688e-01 -1.339511756657e+00 -1.244113224920e+00 -9.539853173733e-02
2.000000000000e+01 6.491397073110e-02 1.320098727949e-02 6.147195262817e-01 3.722737338720e-01 2.424457924098e-01
3.000000000000e+01 3.554043329234e-02 4.296597501519e-01 7.901295853361e-01 4.306052259604e-01 3.595243593757e-01
有没有办法告诉熊猫,Time
是第一个列名
我是这样读的
dat = pd.read_csv('%sdt.dat'%s, delim_whitespace=True)
这不知怎的告诉熊猫第一列的名字是#
:
我如何告诉熊猫
read\u csv
忽略标题中的前两个字符,或者从read\u csv
中获取所需的列名 这里有一个潜在的解决方法:
headers = pd.read_csv('%sdt.dat'%s, delim_whitespace=True, nrows=0).columns[1:]
dat = pd.read_csv('%sdt.dat'%s, delim_whitespace=True, header=None, skiprows=1, names=headers)
或者,您可以通过一些后期处理来修复这些列:
col_mapper = {old:new for old, new in zip(dat.columns, dat.columns[1:])}
dat = dat.iloc[:, :-1].rename(col_mapper, axis=1)
不使用任何空格作为分隔符,您可以指定必须至少有2个空格字符,因为您的数据似乎由多个空格分隔。这将命名第一列
“#Time”
,之后您可以重命名它以删除“#”前缀
:
df = pd.read_csv('%sdt.dat'%s, sep='\s{2,}', engine='python')
print(df)
# Time Cm Cd Cl Cl(f) Cl(r) Cm.1 Cd.1 Cl.1 Cl(f).1 Cl(r).1
0 10.0 -0.574357 -0.586016 -1.339512 -1.244113 -0.095399 NaN NaN NaN NaN NaN
1 20.0 0.064914 0.013201 0.614720 0.372274 0.242446 NaN NaN NaN NaN NaN
2 30.0 0.035540 0.429660 0.790130 0.430605 0.359524 NaN NaN NaN NaN NaN
df.columns = ['Time'] + list(df.columns[1:])
print(df)
Time Cm Cd Cl Cl(f) Cl(r) Cm.1 Cd.1 Cl.1 Cl(f).1 Cl(r).1
0 10.0 -0.574357 -0.586016 -1.339512 -1.244113 -0.095399 NaN NaN NaN NaN NaN
1 20.0 0.064914 0.013201 0.614720 0.372274 0.242446 NaN NaN NaN NaN NaN
2 30.0 0.035540 0.429660 0.790130 0.430605 0.359524 NaN NaN NaN NaN NaN
df = pd.read_csv('%sdt.dat'%s, sep='\s{2,}', engine='python')
print(df)
# Time Cm Cd Cl Cl(f) Cl(r) Cm.1 Cd.1 Cl.1 Cl(f).1 Cl(r).1
0 10.0 -0.574357 -0.586016 -1.339512 -1.244113 -0.095399 NaN NaN NaN NaN NaN
1 20.0 0.064914 0.013201 0.614720 0.372274 0.242446 NaN NaN NaN NaN NaN
2 30.0 0.035540 0.429660 0.790130 0.430605 0.359524 NaN NaN NaN NaN NaN
df.columns = ['Time'] + list(df.columns[1:])
print(df)
Time Cm Cd Cl Cl(f) Cl(r) Cm.1 Cd.1 Cl.1 Cl(f).1 Cl(r).1
0 10.0 -0.574357 -0.586016 -1.339512 -1.244113 -0.095399 NaN NaN NaN NaN NaN
1 20.0 0.064914 0.013201 0.614720 0.372274 0.242446 NaN NaN NaN NaN NaN
2 30.0 0.035540 0.429660 0.790130 0.430605 0.359524 NaN NaN NaN NaN NaN