Python 2.7 过滤掉表中间的额外标题_Python 2.7_Pandas_Numpy

Python 2.7 过滤掉表中间的额外标题

python-2.7 pandas numpy

Python 2.7 过滤掉表中间的额外标题,python-2.7,pandas,numpy,Python 2.7,Pandas,Numpy,我正在尝试导入一个非常大的数据文件。它是一个文本文件，结构如下 ***** Information about Data *********** Information about data Information about Data Information about Data Information about Data Col1 Col2 1.0 1.0 1.0 1.0 1.0 1.0 1.0

我正在尝试导入一个非常大的数据文件。它是一个文本文件，结构如下

***** Information about Data ***********
Information about data
Information about Data
Information about Data

Information about Data

    Col1     Col2
     1.0      1.0
     1.0      1.0
     1.0      1.0
     1.0      1.0
     ...(10k+ lines)
     1.0      1.0
     1.0      1.0
***** Information about Data ***********
Information about data
Information about Data
Information about Data

Information about Data

    Col1     Col2
     1.0      1.0
     1.0      1.0
     1.0      1.0
     1.0      1.0
     ...(10k+ lines)
     1.0      1.0
     1.0      1.0

并重复任意次数。标题之间的行数不同，文件总行数>100万行

有没有一种方法可以在不逐行查看的情况下剥离此标题？我已经写了一行一行的搜索，但是太慢了，不实用

每次显示时，标题都会略有不同。

假设您的文件名为

test.txt

以字符串形式读入整个文件

split

'\n*'

     new line
             \ 
  1.0      1.0
***** Information about Data ***********
 \
  followed by astricks

rsplit

按

'\n\n'

结果并取最后一个

       first new line
                     \
Information about Data

 \
  second new line
    Col1     Col2
     1.0      1.0
     1.0      1.0
     1.0      1.0

```
read\u csv
```
```
pd.concat
```

假设您的文件名为

test.txt

以字符串形式读入整个文件

split

'\n*'

     new line
             \ 
  1.0      1.0
***** Information about Data ***********
 \
  followed by astricks

rsplit

按

'\n\n'

结果并取最后一个

       first new line
                     \
Information about Data

 \
  second new line
    Col1     Col2
     1.0      1.0
     1.0      1.0
     1.0      1.0

```
read\u csv
```
```
pd.concat
```

是否

标题信息

实际上是

标题信息

？否，我将相应地编辑

np。genfromtxt

接受任何可以逐行输入的输入。由于它已经使用

readline

读取文件，因此在管道中插入逐行搜索不会降低速度。使用

pandas'

编译阅读器，情况可能会有所不同。是否

标题信息

实际上是

标题信息

？否，我将相应地编辑

np。genfromtxt

接受任何可以逐行输入的输入。由于它已经使用

readline

读取文件，因此在管道中插入逐行搜索不会降低速度。使用

pandas'

编译阅读器，情况可能会有所不同。