Python中单行匹配的多行输出_Python_List_Beautifulsoup_Weather

Python中单行匹配的多行输出

python list

Python中单行匹配的多行输出,python,list,beautifulsoup,weather,Python,List,Beautifulsoup,Weather,我对Python仍然是个新手，但我正在尝试编写代码，解析NOAA提供的天气信息，并按广播顺序显示我已经设法建立了一个使用python表达式的current conditions列表，其中html文件被切分为一个行列表，然后以适当的顺序重新输出，但每一行都是一行数据。代码如下所示： #other function downloads #http://www.arh.noaa.gov/wmofcst_pf.php?wmo=ASAK48PAFC&type=public #and rena

我对Python仍然是个新手，但我正在尝试编写代码，解析NOAA提供的天气信息，并按广播顺序显示

我已经设法建立了一个使用python表达式的current conditions列表，其中html文件被切分为一个行列表，然后以适当的顺序重新输出，但每一行都是一行数据。代码如下所示：

#other function downloads  
#http://www.arh.noaa.gov/wmofcst_pf.php?wmo=ASAK48PAFC&type=public
#and renames it currents.html
from bs4 import BeautifulSoup as bs
import re
soup = bs(open('currents.html')
weatherRaw = soup.pre.string
towns = ['PAOM', 'PAUN', 'PAGM', 'PASA']
townOut = []
weatherLines = weatherRaw.splitlines()
for i in range(len(towns)):
    p = re.compile(towns[i] + '.*')
    for line in weatherLines:
        matched = p.match(line)
        if matched:
            townOut.append(matched.group())

#sample data from http://www.arh.noaa.gov/wmofcst.php?wmo=FPAK52PAFG&type=public
#BeautifulSouped into list fcst (forecast.pre.get_text().splitlines())
zones = ['AKZ214', 'AKZ215', 'AKZ213'] #note the out-of-numerical-order zones
weatherFull = []
for i in range(len(zones)):
    start = re.compile(zones[i] '.*')
    end = re.compile('&&')
    for line in fcst:
        matched = start.match(line)
        if matched:
            weatherFull.append(matched.group())
            #and the other lines of various contents and length
            #until reaching the end match object

现在我正在处理forecast部分，我遇到了一个问题，因为每个forecast都必须在多行上运行，并且我已经将文件切碎为一个行列表

所以：我要寻找的是一个表达式，它允许我使用类似的循环，这次从找到的行开始追加，在只包含&&的行结束追加。大概是这样的：

#other function downloads  
#http://www.arh.noaa.gov/wmofcst_pf.php?wmo=ASAK48PAFC&type=public
#and renames it currents.html
from bs4 import BeautifulSoup as bs
import re
soup = bs(open('currents.html')
weatherRaw = soup.pre.string
towns = ['PAOM', 'PAUN', 'PAGM', 'PASA']
townOut = []
weatherLines = weatherRaw.splitlines()
for i in range(len(towns)):
    p = re.compile(towns[i] + '.*')
    for line in weatherLines:
        matched = p.match(line)
        if matched:
            townOut.append(matched.group())

#sample data from http://www.arh.noaa.gov/wmofcst.php?wmo=FPAK52PAFG&type=public
#BeautifulSouped into list fcst (forecast.pre.get_text().splitlines())
zones = ['AKZ214', 'AKZ215', 'AKZ213'] #note the out-of-numerical-order zones
weatherFull = []
for i in range(len(zones)):
    start = re.compile(zones[i] '.*')
    end = re.compile('&&')
    for line in fcst:
        matched = start.match(line)
        if matched:
            weatherFull.append(matched.group())
            #and the other lines of various contents and length
            #until reaching the end match object

我应该做些什么来改进这段代码？我知道这很冗长，但在我刚开始的时候，我喜欢能够跟踪我在做什么。提前谢谢

如果这不是你想要的（在这种情况下，很乐意调整），请道歉。您使用BeautifulSoup真是太棒了，但实际上您可以更进一步。查看HTML，似乎每个块都以

结构开始，并在下一个

结构结束。在这种情况下，您可以这样做，为每个区域提取相应的HTML：

from bs4 import BeautifulSoup

# I put the HTML in a file, but this will work with a URL as well
with open('weather.html', 'r') as f:
  fcst = f.read()

# Turn the html into a navigable soup object
soup = BeautifulSoup(fcst)

# Define your zones
zones = ['AKZ214', 'AKZ215', 'AKZ213']

weatherFull = []

# This is a more Pythonic loop structure - instead of looping over
# a range of len(zones), simply iterate over each element itself
for zone in zones:
  # Here we use BS's built-in 'find' function to find the 'a' element
  # with a name = the zone in question (as this is the pattern).
  zone_node = soup.find('a', {'name': zone})

  # This loop will continue to cycle through the elements after the 'a'
  # tag until it hits another 'a' (this is highly structure dependent :) )
  while True:
    weatherFull.append(zone_node)
    # Set the tag node = to the next node
    zone_node = zone_node.nextSibling
    # If the next node's tag name = 'a', break out and go to the next zone
    if getattr(zone_node, 'name', None)  == 'a':
      break

# Process weatherFull however you like
print weatherFull

希望这有帮助（或者至少在你想要的东西的大致范围内！）。

这正是我想要的-当我无法在第一个集合中以这种方式使用BeautifulSoup时，我很沮丧（因为该html集合中没有标记）。我真不敢相信这次我忘了检查它是否被贴上了标签！谢谢你的帮助。：）@拉威尔一点也不担心！有一篇关于用正则表达式解析HTML的非常有趣的帖子，你可能已经看到了（以防万一）——这肯定把我从正则表达式推到了像BS:）这样的东西上。同样，由于你对Python非常陌生，你的代码看起来很棒！谢谢你的夸奖；我已经在这方面做了几天了，我是一个干净、可读代码的爱好者。我的很多搜索都显示出可读代码的缺乏。。。我很高兴我是一名广播员，而不是一名编码员。不过，拥有解决问题的工具集还是不错的！我从基础、C++和大学里的一点java就没有使用过编码语言。这两种语言都有自己的怪癖，这很酷，但通常都会运行类似的过程。那篇关于解析HTML的文章太棒了。我不知道这是好是坏，但我觉得很有趣…；-）@Raveler1那么你一定会喜欢Python：）在我看来，它需要特定的缩进，这使得它成为一种可读性更高的语言。这篇文章每次都让我发笑，尤其是当你开始处理HTML时，想到使用真正的解析器之外的东西时，你会感到不安：）