Python 使用熊猫读取数据（.dat文件）_Python_Pandas_Dataframe

Python 使用熊猫读取数据（.dat文件）

python pandas dataframe

Python 使用熊猫读取数据（.dat文件）,python,pandas,dataframe,Python,Pandas,Dataframe,如何使用Pandas读取以下（两列）数据（来自.dat文件） TIME XGSM 2004 006 01 00 01 37 600 1 2004 006 01 00 02 32 800 5 2004 006 01 00 03 28 000 8 2004 006 01 00 04 23 200 11 2004 006 01 00 05 18 400 17 列分隔符（至少）为2个空格我试过了 df = pd.read_table("test.d

如何使用Pandas读取以下（两列）数据（来自.dat文件）

TIME                      XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17

列分隔符（至少）为2个空格

我试过了

df = pd.read_table("test.dat", sep="\s+", usecols=['TIME', 'XGSM'])
print df

但它会打印出来

可以将参数usecols与列的顺序一起使用：

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIME             XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), 
                 sep="\s+", 
                 skiprows=1, 
                 usecols=[0,7], 
                 names=['TIME','XGSM'])

print (df)
   TIME  XGSM
0  2004     1
1  2004     5
2  2004     8
3  2004    11
4  2004    17

编辑：

您可以使用分隔符

regex

-2和更多空格，然后添加

engine='python'

，因为警告：

ParserWarning：返回到“python”引擎，因为“c”引擎不支持正则表达式分隔符（分隔符>1个字符，与“\s+”不同的分隔符被解释为正则表达式）；您可以通过指定engine='python'来避免此警告

也可以尝试

pd.read_fwf（）

（将固定宽度格式化行的表格读入数据帧）：

所以，如果你不通过宽度，它会根据标题自动计算出来吗？@ayhan。从文档中，默认情况下，它使用前100行数据来检测列规格。第一列包含

2004 006 01 00 01 37 600

，即可能重复的

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIME              XGSM
2004 006 01 00 01 37 600   1
2004 006 01 00 02 32 800   5
2004 006 01 00 03 28 000   8
2004 006 01 00 04 23 200   11
2004 006 01 00 05 18 400   17"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), sep=r'\s{2,}', engine='python')

print (df)
                       TIME  XGSM
0  2004 006 01 00 01 37 600     1
1  2004 006 01 00 02 32 800     5
2  2004 006 01 00 03 28 000     8
3  2004 006 01 00 04 23 200    11
4  2004 006 01 00 05 18 400    17

import pandas as pd
from io import StringIO

pd.read_fwf(StringIO("""TIME                      XGSM
2004 006 01 00 01 37 600  1
2004 006 01 00 02 32 800  5
2004 006 01 00 03 28 000  8
2004 006 01 00 04 23 200  11
2004 006 01 00 05 18 400  17"""), usecols = ["TIME", "XGSM"])

#   TIME    XGSM
#0  2004    1
#1  2004    5
#2  2004    8
#3  2004    11
#4  2004    17