用正则表达式解析python中的json响应

用正则表达式解析python中的json响应,python,json,regex,parsing,Python,Json,Regex,Parsing,情景:我有一个tkinter制作的天气GUI。它从api获取数据并将其显示在tkinter标签上。其中一个函数“format_alerts”解析api中的json数据。因为数据的格式化方式,我很难解析它以满足我的需要 问题:我想出了一种非常奇怪的解析数据的方法。json使用“…”和“astrix”分隔字符串中的值(在字典中)。我使用.replace('\n','')来删除换行符。我使用.replace('astrix','@')和.replace('…','@'来查找拆分点。然后使用.split

情景:我有一个tkinter制作的天气GUI。它从api获取数据并将其显示在tkinter标签上。其中一个函数“format_alerts”解析api中的json数据。因为数据的格式化方式,我很难解析它以满足我的需要

问题:我想出了一种非常奇怪的解析数据的方法。json使用“…”和“astrix”分隔字符串中的值(在字典中)。我使用.replace('\n','')来删除换行符。我使用.replace('astrix','@')和.replace('…','@'来查找拆分点。然后使用.split('@'))然后引用列表索引号。但是有时候json会随机使用“…”,因此我最终会弄乱索引。我知道正则表达式是一种更好的方法,但就我的一生而言,我无法使用由三部分组成的正则表达式搜索

我当前的代码如下所示:

def format_alerts(weather_json):
    alert_report = ""
    alerts = weather_json['alerts']
    try:
        # for loop is because sometime there are several different alerts at the list level.
        for item in alerts:
            event = item['event']
            details = item['description']
            parsed = details.replace('\n', ' ').replace('*', '@').replace('...', '@').split('@')
            # textwrap is used to make sure it fits in my tkinter label
            what = textwrap.fill(parsed[4], 51)
            where = textwrap.fill(parsed[6], 51)
            when = textwrap.fill(parsed[8], 51)
            # plugs the textwrapped pieces into a single string.
            single_alert = '''{}: {}\nWhere: {}\nWhen: {}\n'''.format(event, what, where, when)
            alert_report += single_alert
except:
    alert_report = "Alerts Error"
    print('ERROR: (format_alerts) retrieving, formatting >>> alert_report ')
return alert_report
{'alerts: [{
    'event': 'Small Craft Advisory', 
    'description': 
'...SMALL CRAFT ADVISORY NOW IN EFFECT UNTIL 3 AM PST SATURDAY...\n* WHAT...Rough bar conditions
 expected.\n- GENERAL SEAS...Seas 18 to 20 ft today then easing to 14 ft\nlate tonight and 
Sat.\n- FIRST EBB...Around 415 AM Fri. Seas near 20 feet with\nbreakers.\n- SECOND EBB...Strong
 ebb around 415 PM. Seas near 20 ft\nwith breakers.\n* WHERE...In the Main Channel of the 
Columbia River Bar.\n* WHEN...Until 3 AM PST Saturday.\n* IMPACTS...Conditions will be hazardous
 to small craft\nespecially when navigating in or near harbor entrances.'
# I addedd newlines so it wasn't massively long. raw data only has newlines denoted by \n
},]}
天气看起来像:

def format_alerts(weather_json):
    alert_report = ""
    alerts = weather_json['alerts']
    try:
        # for loop is because sometime there are several different alerts at the list level.
        for item in alerts:
            event = item['event']
            details = item['description']
            parsed = details.replace('\n', ' ').replace('*', '@').replace('...', '@').split('@')
            # textwrap is used to make sure it fits in my tkinter label
            what = textwrap.fill(parsed[4], 51)
            where = textwrap.fill(parsed[6], 51)
            when = textwrap.fill(parsed[8], 51)
            # plugs the textwrapped pieces into a single string.
            single_alert = '''{}: {}\nWhere: {}\nWhen: {}\n'''.format(event, what, where, when)
            alert_report += single_alert
except:
    alert_report = "Alerts Error"
    print('ERROR: (format_alerts) retrieving, formatting >>> alert_report ')
return alert_report
{'alerts: [{
    'event': 'Small Craft Advisory', 
    'description': 
'...SMALL CRAFT ADVISORY NOW IN EFFECT UNTIL 3 AM PST SATURDAY...\n* WHAT...Rough bar conditions
 expected.\n- GENERAL SEAS...Seas 18 to 20 ft today then easing to 14 ft\nlate tonight and 
Sat.\n- FIRST EBB...Around 415 AM Fri. Seas near 20 feet with\nbreakers.\n- SECOND EBB...Strong
 ebb around 415 PM. Seas near 20 ft\nwith breakers.\n* WHERE...In the Main Channel of the 
Columbia River Bar.\n* WHEN...Until 3 AM PST Saturday.\n* IMPACTS...Conditions will be hazardous
 to small craft\nespecially when navigating in or near harbor entrances.'
# I addedd newlines so it wasn't massively long. raw data only has newlines denoted by \n
},]}
我希望返回的“警报报告”字符串如下所示:

'''Small Craft Advisory: Rough bar conditions expected. GENERAL SEAS Seas 18 to 20 ft today 
then easing to 14 ft late tonight and Sat. FIRST EBB Around 415 AM Fri. Seas near 20 feet 
with SECOND EBB Strong ebb around 415 PM. Seas near 20 ft breakers.
Where: In the Main Channel of the Columbia River Bar
When: Until 3 AM PST Saturday.
注意:我目前的代码处理了30个警报。这是第一个破坏我代码的警报。我可以生活在第一行中没有“18到20英尺的公海…”。但我不想在“.”处切断它,因为有些警报是几句话。
我学过正则表达式,但我不太擅长。

也许是这样的

import re
import textwrap

alert = (
    "...SMALL CRAFT ADVISORY NOW IN EFFECT UNTIL 3 AM PST SATURDAY...\n* WHAT"
    "...Rough bar conditions expected.\n- GENERAL SEAS...Seas 18 to 20 ft "
    "today then easing to 14 ft\nlate tonight and Sat.\n- FIRST EBB...Around "
    "415 AM Fri. Seas near 20 feet with\nbreakers.\n- SECOND EBB...Strong ebb "
    "around 415 PM. Seas near 20 ft\nwith breakers.\n* WHERE...In the Main "
    "Channel of the Columbia River Bar.\n* WHEN...Until 3 AM PST Saturday.\n* "
    "IMPACTS...Conditions will be hazardous to small craft\nespecially when "
    "navigating in or near harbor entrances."
)

# we use this to strip out newlines and '...' markers.
re_garbage = re.compile(r'(\.\.\.|\n)')

# this recognizes the major sections of the alert such
# as '* WHEN' and '* WHERE'.
re_keys = re.compile(r'\* ([A-Z ]+) ([^*]+)')

# This recognizes list items like `- FIRST EBB', etc.
re_item = re.compile(r'- ([A-Z ]+) ([^*-]+)')

# replace newlines and '...' with a space, and strip any
# leading/trailing whitespace.
alert = re_garbage.sub(' ', alert).strip()

# Get rid of the '- ' on list items
alert = re_item.sub(r'\1 \2', alert)

# Extract the major parts into a dictionary
parts = {}
while match := re_keys.search(alert):
    parts[match.group(1)] = match.group(2)
    alert = alert[:match.start()] + alert[match.end():]

# Avengers assemble!
final = '\n'.join([
    textwrap.fill(alert + parts['WHAT']),
    f'When: {parts["WHEN"]}',
    f'Where: {parts["WHERE"]}',
])

print(final)
产生:

SMALL CRAFT ADVISORY NOW IN EFFECT UNTIL 3 AM PST SATURDAY  Rough bar
conditions expected. GENERAL SEAS Seas 18 to 20 ft today then easing
to 14 ft late tonight and Sat. FIRST EBB Around 415 AM Fri. Seas near
20 feet with breakers. SECOND EBB Strong ebb around 415 PM. Seas near
20 ft with breakers.
When: Until 3 AM PST Saturday. 
Where: In the Main Channel of the Columbia River Bar. 

也许是这样的

import re
import textwrap

alert = (
    "...SMALL CRAFT ADVISORY NOW IN EFFECT UNTIL 3 AM PST SATURDAY...\n* WHAT"
    "...Rough bar conditions expected.\n- GENERAL SEAS...Seas 18 to 20 ft "
    "today then easing to 14 ft\nlate tonight and Sat.\n- FIRST EBB...Around "
    "415 AM Fri. Seas near 20 feet with\nbreakers.\n- SECOND EBB...Strong ebb "
    "around 415 PM. Seas near 20 ft\nwith breakers.\n* WHERE...In the Main "
    "Channel of the Columbia River Bar.\n* WHEN...Until 3 AM PST Saturday.\n* "
    "IMPACTS...Conditions will be hazardous to small craft\nespecially when "
    "navigating in or near harbor entrances."
)

# we use this to strip out newlines and '...' markers.
re_garbage = re.compile(r'(\.\.\.|\n)')

# this recognizes the major sections of the alert such
# as '* WHEN' and '* WHERE'.
re_keys = re.compile(r'\* ([A-Z ]+) ([^*]+)')

# This recognizes list items like `- FIRST EBB', etc.
re_item = re.compile(r'- ([A-Z ]+) ([^*-]+)')

# replace newlines and '...' with a space, and strip any
# leading/trailing whitespace.
alert = re_garbage.sub(' ', alert).strip()

# Get rid of the '- ' on list items
alert = re_item.sub(r'\1 \2', alert)

# Extract the major parts into a dictionary
parts = {}
while match := re_keys.search(alert):
    parts[match.group(1)] = match.group(2)
    alert = alert[:match.start()] + alert[match.end():]

# Avengers assemble!
final = '\n'.join([
    textwrap.fill(alert + parts['WHAT']),
    f'When: {parts["WHEN"]}',
    f'Where: {parts["WHERE"]}',
])

print(final)
产生:

SMALL CRAFT ADVISORY NOW IN EFFECT UNTIL 3 AM PST SATURDAY  Rough bar
conditions expected. GENERAL SEAS Seas 18 to 20 ft today then easing
to 14 ft late tonight and Sat. FIRST EBB Around 415 AM Fri. Seas near
20 feet with breakers. SECOND EBB Strong ebb around 415 PM. Seas near
20 ft with breakers.
When: Until 3 AM PST Saturday. 
Where: In the Main Channel of the Columbia River Bar. 

如果打印一组警报“说明”,则可能会有一个可选的摘要或讨论,后面是项目符号列表。项目符号以
*
开头,子项目符号以
-
开头。每个项目符号都有一个大写键,然后是“…”,然后是文本。由于您只对某些项目符号感兴趣,因此应该使用类似的正则表达式工作(用空格替换…后):

使用
pattern.findall()
将生成两个元组的列表。第一个元素是“WHAT”、“WHERE”或“WHEN”。第二个元素是直到下一个“*”或字符串结尾的描述。使用
dict()
该列表将创建一个字典,其中“WHAT”、“WHERE”和“WHEN”作为键,捕获的文本作为值

将其放入为单个警报生成报告的函数中:

def format_alert(event, details):
    try:
        details = details.replace('\n', ' ').replace('- ', '').replace('...', ' ')
    
        info = dict(re.findall(r"[*] (WHAT|WHERE|WHEN) ([^*]+)", details))
    
        what = textwrap.fill(f"{event.title()}: {info['WHAT']}", 51)
        where = textwrap.fill(f"Where: {info['WHERE']}", 51)
        when = textwrap.fill(f"When: {info['WHEN']}", 51)
        report = f"{what}\n{where}\n{when}\n"
                    
    except (KeyError, ValueError):
        report = "Alert Error"
        print('ERROR: (format_alerts) retrieving, formatting >>> alert_report ')
        
    return report
对于示例输入,它返回:

Small Craft Warning: Rough bar conditions expected.
GENERAL SEAS Seas 18 to 20 ft today then easing to
14 ft late tonight and Sat. FIRST EBB Around 415 AM
Fri. Seas near 20 feet with breakers. SECOND EBB
Strong ebb around 415 PM. Seas near 20 ft with
breakers.
Where: In the Main Channel of the Columbia River
Bar.
When: Until 3 AM PST Saturday.

如果打印一组警报“说明”,则可能会有一个可选的摘要或讨论,后面是项目符号列表。项目符号以
*
开头,子项目符号以
-
开头。每个项目符号都有一个大写键,然后是“…”,然后是文本。由于您只对某些项目符号感兴趣,因此应该使用类似的正则表达式工作(用空格替换…后):

使用
pattern.findall()
将生成两个元组的列表。第一个元素是“WHAT”、“WHERE”或“WHEN”。第二个元素是直到下一个“*”或字符串结尾的描述。使用
dict()
该列表将创建一个字典,其中“WHAT”、“WHERE”和“WHEN”作为键,捕获的文本作为值

将其放入为单个警报生成报告的函数中:

def format_alert(event, details):
    try:
        details = details.replace('\n', ' ').replace('- ', '').replace('...', ' ')
    
        info = dict(re.findall(r"[*] (WHAT|WHERE|WHEN) ([^*]+)", details))
    
        what = textwrap.fill(f"{event.title()}: {info['WHAT']}", 51)
        where = textwrap.fill(f"Where: {info['WHERE']}", 51)
        when = textwrap.fill(f"When: {info['WHEN']}", 51)
        report = f"{what}\n{where}\n{when}\n"
                    
    except (KeyError, ValueError):
        report = "Alert Error"
        print('ERROR: (format_alerts) retrieving, formatting >>> alert_report ')
        
    return report
对于示例输入,它返回:

Small Craft Warning: Rough bar conditions expected.
GENERAL SEAS Seas 18 to 20 ft today then easing to
14 ft late tonight and Sat. FIRST EBB Around 415 AM
Fri. Seas near 20 feet with breakers. SECOND EBB
Strong ebb around 415 PM. Seas near 20 ft with
breakers.
Where: In the Main Channel of the Columbia River
Bar.
When: Until 3 AM PST Saturday.

感谢拉尔克斯的帮助。感谢拉尔克斯的帮助。非常感谢。第二,我最终使用了你的解决方案。我仍然不太明白这一行
info=dict(re.findall(r“[*](WHAT | WHERE | WHEN)([^*]),细节))
这是一本包含搜索关键字键的字典?@nimic,我对答案做了更多解释。太棒了,我现在明白了。谢谢。非常感谢。第二,我最终使用了你的解决方案。我仍然不太明白这一行
info=dict(re.findall(r“[*](WHAT | WHERE | WHEN)([^*]),细节))
它制作了一本包含搜索关键字键的字典?@nimic,我在答案中添加了更多解释。太棒了,我现在明白了。谢谢。