Python 如何从包含相同单词行数的文件中提取一次给定单词的行

Python 如何从包含相同单词行数的文件中提取一次给定单词的行,python,grep,text-extraction,Python,Grep,Text Extraction,我有一个包含一个月数据的数据文件。文件格式如下: VAAU Observations at 00Z 02 Aug 2017 ------------------------------------------------------------------------------------------- PRES HGHT TEMP DWPT FRPT RELH RELI MIXR DRCT SKNT THTA THTE THTV

我有一个包含一个月数据的数据文件。文件格式如下:

VAAU Observations at 00Z 02 Aug 2017

-------------------------------------------------------------------------------------------
   PRES   HGHT   TEMP   DWPT   FRPT   RELH   RELI   MIXR   DRCT   SKNT   THTA   THTE   THTV
    hPa     m      C      C      C      %      %    g/kg    deg   knot     K      K      K
-------------------------------------------------------------------------------------------
 1000.0     66
  942.0    579   22.6   20.3   20.3     87     87  16.20    270      4  300.8  348.6  303.8
  925.0    747   21.6   19.9   19.9     90     90  16.09    265     10  301.4  348.9  304.3
  850.0   1481   18.8   17.1   17.1     90     90  14.65    275     19  305.8  350.0  308.5
  812.0   1873   17.3   14.1   14.1     82     82  12.60    275     22  308.2  346.6  310.6
...................
Station information and sounding indices
                         Station identifier: VAAU
                             Station number: 43014
                           Observation time: 170801/0000
                           Station latitude: 19.85
                          Station longitude: 75.40
                          Station elevation: 579.0
                            Showalter index: 0.92
                               Lifted index: 0.99
    LIFT computed using virtual temperature: 0.46
                                SWEAT index: 255.81
                                    K index: 34.70
                         Cross totals index: 19.70
                      Vertical totals index: 20.10
                        Totals totals index: 39.80
      Convective Available Potential Energy: 5.98
             CAPE using virtual temperature: 9.37
                      Convective Inhibition: -81.35
             CINS using virtual temperature: -69.07
                           Equilibrum Level: 617.53
 Equilibrum Level using virtual temperature: 523.66
                   Level of Free Convection: 662.87
             LFCT using virtual temperature: 669.25
                     Bulk Richardson Number: 4.12
          Bulk Richardson Number using CAPV: 6.44
  Temp [K] of the Lifted Condensation Level: 292.45
Pres [hPa] of the Lifted Condensation Level: 894.64
     Mean mixed layer potential temperature: 301.92
              Mean mixed layer mixing ratio: 16.03
              1000 hPa to 500 hPa thickness: 5818.00
Precipitable water [mm] for entire sounding: 51.19
一个月内每天都会重复同样的事情。 我只想从该文件中提取一次
站点标识符、站点编号、站点纬度和站点经度

我尝试使用python脚本,但没有得到想要的输出。 甚至我也尝试过grep:

grep -E "Station number|Station latitude|Station longitude|Station identifier" wrkk_2017.out


for line in open('vaau_2017.out'):
    rec = line.strip()
    words = ["Station identifier:", "Station number:", "Station latitude:", "Station longitude"]
    for rec in words:
        if rec in line:
            print (line)
            break

我只需要站点标识符:
…,站点编号:…,站点纬度:…,站点经度:…
一次,但我得到它的次数与该文件中的次数相同。

您可以添加一个布尔数组,如果已经找到一个单词,则可以在其中保存:

still_left = [True] * len(words)

for line in open('vaau_2017.out'):
    for i, w in enumerate(words):
        if w in line and still_left[i]:
            print(line)
            still_left[i] = False
    if sum(still_left)==0:
        break
示例:

如果要在找到所有单词后立即中断读取文件,可以添加

    if sum(still_left)==0:
        break

在内部
for i,w…
循环后面的
for line…
级别。

您可以使用regex-

a = 'Station information and sounding indices Station identifier: VAAU Station number: 43014 Observation time: 170801/0000 Station latitude: 19.85 Station longitude: 75.40 Station elevation: 579.0 Showalter index: 0.92 Lifted index: 0.99 LIFT computed using virtual temperature: 0.46 SWEAT index: 255.81 K index: 34.70 Cross totals index: 19.70 Vertical totals index: 20.10'
学习路径:

编辑:

你问题的解决办法-

filename = "vaau_2017.out"
with open(filename) as f:
    for line in f.readlines():

        if 'Station identifier' in line:
            station_identifier = re.search('Station identifier: ([\sA-Z]+)',line).group(1)
            print station_identifier #VAAU

        if 'Station number' in line:
            station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_number #43014

        if 'Station latitude' in line:
            station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_latitude #19.85

        if 'Station longitude' in line:
            station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_longitude #75.40

station_identifier = re.search('Station identifier: ([A-Z]+)',a).group(1)
print station_identifier #VAAU
station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_number #43014
station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_latitude #19.85
station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',a).group(1)
print station_longitude #75.40
filename = "vaau_2017.out"
with open(filename) as f:
    for line in f.readlines():

        if 'Station identifier' in line:
            station_identifier = re.search('Station identifier: ([\sA-Z]+)',line).group(1)
            print station_identifier #VAAU

        if 'Station number' in line:
            station_number = re.search('Station number: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_number #43014

        if 'Station latitude' in line:
            station_latitude = re.search('Station latitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_latitude #19.85

        if 'Station longitude' in line:
            station_longitude = re.search('Station longitude: ([+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?)',line).group(1)
            print station_longitude #75.40