Python 在x行数之后指定新列，使用read_fwf从.dat文件读取_Python_Pandas

Python 在x行数之后指定新列，使用read_fwf从.dat文件读取

python pandas

Python 在x行数之后指定新列，使用read_fwf从.dat文件读取,python,pandas,Python,Pandas,我正在尝试将.dat文件中的数据读取到数据帧中。但是格式和我通常看到的有点不同。（这是一个裁剪视图，每个字段都有更多的行，并且有更多的字段）总共有5个#Field。我要做的是以以下格式读入数据帧： "#Field-1" "2 m temperature(k)" "#Field-2" "100 m temperature(K)" 190 1 15.18 55.0 248.9.. 190 2 15.18 55.0 284.1... 191

我正在尝试将.dat文件中的数据读取到数据帧中。但是格式和我通常看到的有点不同。（这是一个裁剪视图，每个字段都有更多的行，并且有更多的字段）

总共有5个#Field。我要做的是以以下格式读入数据帧：

"#Field-1"  "2 m temperature(k)"    "#Field-2"  "100 m temperature(K)"
190          1 15.18 55.0 248.9..    190         2 15.18 55.0 284.1...
191          1 15.27 55.13 285.0..   191         2 15.27 55.13 284.1..

我尝试了以下方法：

colspecs = [(0, 8), (8, 1000)]
pd.read_fwf("ENERGINET_ECM_2017102600.dat",skiprows=13,colspecs=colspecs,sep=r"\s+",)

但这只返回2列，有没有办法指定在x行数之后需要一个新列？或者我应该使用不同的函数

编辑：

将值添加到结果集

试试下面的代码，它以列表的形式从文件中读取和获取数据，然后用于创建数据帧。解释添加为注释：

# READ ALL LINES
with open("tempfile.dat", "r") as f:
    lines = f.readlines()

# GET COLUMN NAMES: 
colnames = []
for line in lines:
    if line.startswith("#Field="):
        words = line.split()
        colnames.append(words[0])
        colnames.append(" ".join(words[1:]))

# REMOVE LINES STARTING WITH #:
newlines = []
for line in lines:
    if not line.startswith("#"):
        newlines.append(line)

# GET ALL FIELD NAMES, WITHOUT DUPLICATING:  
fldnames = []
for line in newlines:
    name = line.split()[0]
    if name not in fldnames:
        fldnames.append(name)

# READ ALL ROWS TO CREATE A LIST OF LISTS FOR DATAFRAME: 
allrows = []
for name in fldnames: 
    onerow = []
    for line in newlines:
        words = line.split()
        if words[0] == name:
            onerow.append(words[0])
            onerow.append(words[1:])
    allrows.append(onerow)

# CREATE DATAFRAME: 
df = pd.DataFrame(data=allrows, columns=colnames)
print(df)

输出：

  #Field-1               2 m temperature(k) #Field-2               #100 m temperature(K)  
0      190   [1, 15.18, 55.0, 284.9, 284.8]      190      [2, 15.18, 55.0, 284.1, 284.1]  
1      191  [1, 15.27, 55.13, 285.0, 284.9]      191     [2, 15.27, 55.13, 284.1, 284.1]

注意：为了更清晰地显示，我将值截断为初始值5。代码应该适用于任意数量的行和字段。

您应该至少完整地发布前3行（带值），以便清楚地知道您需要什么作为输出。出于测试目的，您可以减少值的数量。发布编辑！那么191向右移动了一列，而不是192？这真的会搞砸任何像read_fwf这样的专栏文章。这里最明显的方法似乎是使用空格分隔的方法（使用read_csv）以indivudual列的形式读入，然后合并这些列。

  #Field-1               2 m temperature(k) #Field-2               #100 m temperature(K)  
0      190   [1, 15.18, 55.0, 284.9, 284.8]      190      [2, 15.18, 55.0, 284.1, 284.1]  
1      191  [1, 15.27, 55.13, 285.0, 284.9]      191     [2, 15.27, 55.13, 284.1, 284.1]