Python 如何将Twitch IRC响应中的表情解析为字典列表？_Python_Python 3.x_Parsing_Python 3.5_Twitch

Python 如何将Twitch IRC响应中的表情解析为字典列表？

python python-3.x parsing

Python 如何将Twitch IRC响应中的表情解析为字典列表？,python,python-3.x,parsing,python-3.5,twitch,Python,Python 3.x,Parsing,Python 3.5,Twitch,我想将一条来自Twitch的IRC消息解析为一个字典列表，其中包含表情以下是我可以从Twitch获得的示例： "Testing. :) Confirmed!" {"emotes": [(1, (9, 10))]} 它描述了一个emote，ID为1，从字符9到10（字符串的索引为零）我希望我的数据采用以下格式： [ { "type": "text", "text": "Testing. " }, { "type": "

我想将一条来自Twitch的IRC消息解析为一个字典列表，其中包含表情

以下是我可以从Twitch获得的示例：

"Testing. :) Confirmed!"

{"emotes": [(1, (9, 10))]}

它描述了一个emote，ID为1，从字符9到10（字符串的索引为零）

我希望我的数据采用以下格式：

[
    {
        "type": "text",
        "text": "Testing. "
    },
    {
        "type": "emote",
        "text": ":)",
        "id": 1
    },
    {
        "type": "text",
        "text": " Confirmed!"
    }
]

有没有一个相对干净的方法来实现这一点呢？

我找到了一个解决方案，虽然很难看，但很有效

import re

packet_expression = re.compile(r'(@.+)? :([a-zA-Z0-9][\w]{2,23})!\2@\2.tmi.twitch.tv PRIVMSG #([a-zA-Z0-9][\w]{2,23}) :(.+)')

def parse_twitch(packet):

    match = re.match(packet_expression, packet)

    items = match.group(1)[1:].split(';')
    tags = dict(item.split('=') for item in items)

    emote_expression = re.compile(r'(\d+):((\d+-\d+,)*\d+-\d+)')
    tags["emotes"] = [
        (int(emotes[0]), (int(start), int(end)))
        for emotes in re.findall(emote_expression, tags.get("emotes", ''))
        for location in emotes[1].split(',')
        for start, end in [location.split('-')]
    ]

    message = match.group(4)
    characters = list(message)

    offset = 0
    for emote in tags["emotes"]:
        characters[emote[1][0]-offset : emote[1][1]-offset+1] = [{
            "type": "emote",
            "text": ''.join(characters[emote[1][0]-offset : emote[1][1]-offset+1]),
            "id": emote[0]
        }]
        offset += emote[1][1] - emote[1][0]

    index = 0
    while any(isinstance(item, str) for item in characters):
        if isinstance(characters[index], str) and isinstance(characters[index+1], str):
            characters[index:index+2] = [characters[index] + characters[index+1]]
        else:
            if isinstance(characters[index], str):
                characters[index] = {"type": "text", "text": characters[index]}
            index += 1

    return characters

我不确定您收到的消息是否如下所示：

message = '''\
"Testing. :) Confirmed!"

{"emotes": [(1, (9, 10))]}'''

或

我假设是后者，因为很容易从前者转换到后者。也可能是这些是python表示。你不是很清楚

有一种更好的方法来解决这个问题，不使用正则表达式，只使用字符串解析：

import json                                                                                                                                                                                                                     

text = 'Testing. :) Confirmed! :P'                                                                                                                                                                                              
print(len(text))                                                                                                                                                                                                                
meta = '{"emotes": [(1, (9, 10)), (2, (23,25))]}'                                                                                                                                                                               
meta = json.loads(meta.replace('(', '[').replace(')', ']'))                                                                                                                                                                     


results = []                                                                                                                                                                                                                    
cur_index = 0                                                                                                                                                                                                                   
for emote in meta['emotes']:                                                                                                                                                                                                    
    results.append({'type': 'text', 'text': text[cur_index:emote[1][0]]})                                                                                                                                                       
    results.append({'type': 'emote', 'text': text[emote[1][0]:emote[1][1]+1],                                                                                                                                                   
                    'id': emote[0]})                                                                                                                                                                                            
    cur_index = emote[1][1]+1                                                                                                                                                                                                   

if text[cur_index:]:                                                                                                                                                                                                            
    results.append({'type': 'text', 'text': text[cur_index:]})                                                                                                                                                                  

import pprint; pprint.pprint(results)

根据您的评论，数据是以自定义格式提供的。我从评论中复制/粘贴了几个字符，但我不确定这些字符是否真的出现在传入的数据中，我希望这一部分是正确的。消息中只有一种类型的emote，因此我不完全确定它如何表示多种不同的emote类型-我希望有一些分隔符，而不是多个

emote=

部分，或者这种方法需要一些重大修改，但这应该提供解析，而不需要正则表达式

from collections import namedtuple


Emote = namedtuple('Emote', ('id', 'start', 'end'))


def parse_emotes(raw):
    emotes = []
    for raw_emote in raw.split('/'):
        id, locations = raw_emote.split(':')
        id = int(id)
        locations = [location.split('-')
                     for location in locations.split(',')]
        for location in locations:
            emote = Emote(id=id, start=int(location[0]), end=int(location[1]))
            emotes.append(emote)
    return emotes

data = r'@badges=moderator/1;color=#0000FF;display-name=2Cubed;emotes=25:6-10,12-16;id=05aada01-f8c1-4b2e-a5be-2534096057b9;mod=1;room-id=82607708;subscriber=0;turbo=0;user-id=54561464;user-type=mod:2cubed!2cubed@2cubed.tmi.twitch.tv PRIVMSG #innectic :Hiya! Kappa Kappa'

meta, msgtype, channel, message = data.split(' ', maxsplit=3)
meta = dict(tag.split('=') for tag in meta.split(';'))
meta['emotes'] = parse_emotes(meta['emotes'])

是的，有，但是没有人会为你编码。尝试一些东西，当你遇到错误时，问一个问题question@iScrE4m我不知道如何继续，这就是我问这个问题的原因<代码>r“@badges=版主/1；颜色=#0000FF；显示名称=2订阅；表情=25:6-10,12-16；id=05aada01-f8c1-4b2e-a5be-2534096057b9；mod=1；房间id=82607708；订阅者=0；turbo=0；用户id=54561464；用户类型=mod:2订阅！2cubed@2cubed.tmi.twitch.tvPRIVMSG#Innitic:嗨！卡帕卡帕“不幸的是，需要一些正则表达式来解析消息体和emote位置。不过，解析完成后，我非常喜欢您的解决方案。：）@2Cubed我已经更新了我的答案，以表明您可能实际上也不需要正则表达式来解析这些消息；）即使存在多种类型的数据包，

split

部分也会失败，您也可以在data中使用

'PRIVMSG'。这看起来很棒。但是，当使用多个不同的emote时，例如在这个包中，它似乎失败了<代码>r“@徽章=主持人/1；颜色=#0000FF；显示名称=2订阅；表情=25:8-12/70433:14-22；id=20574cd9-a008-4b03-bc1b-80fabcd0723f；mod=1；房间id=82607708；订户=0；turbo=0；用户id=54561464；用户类型=：2订阅！2cubed@2cubed.tmi.twitch.tvPRIVMSG#Innitic:Test…Kappa KappaRoss“

。我提交了一个编辑来修复这个问题。啊，太酷了。这似乎是完美的工作，是无限的清洁比我以前。非常感谢。不客气！请记住，Python有非常强大的字符串处理工具。很少需要用到正则表达式——实际上，我在过去8年中只在几个案例中使用过它们。当然，正则表达式是有用的，但通常它们是多余的。

from collections import namedtuple


Emote = namedtuple('Emote', ('id', 'start', 'end'))


def parse_emotes(raw):
    emotes = []
    for raw_emote in raw.split('/'):
        id, locations = raw_emote.split(':')
        id = int(id)
        locations = [location.split('-')
                     for location in locations.split(',')]
        for location in locations:
            emote = Emote(id=id, start=int(location[0]), end=int(location[1]))
            emotes.append(emote)
    return emotes

data = r'@badges=moderator/1;color=#0000FF;display-name=2Cubed;emotes=25:6-10,12-16;id=05aada01-f8c1-4b2e-a5be-2534096057b9;mod=1;room-id=82607708;subscriber=0;turbo=0;user-id=54561464;user-type=mod:2cubed!2cubed@2cubed.tmi.twitch.tv PRIVMSG #innectic :Hiya! Kappa Kappa'

meta, msgtype, channel, message = data.split(' ', maxsplit=3)
meta = dict(tag.split('=') for tag in meta.split(';'))
meta['emotes'] = parse_emotes(meta['emotes'])