python随机化器-在具有双嵌套级别的大括号之间获取随机文本

python随机化器-在具有双嵌套级别的大括号之间获取随机文本,python,random,curly-braces,Python,Random,Curly Braces,嘿,我需要创建简单的python随机化器。输入示例: {{hey|hello|hi}|{privet|zdravstvuy|kak dela}|{bonjour|salut}}, can {you|u} give me advice? 输出应为: hello, can you give me advice 我有一个脚本,它可以做到这一点,但只能在一个嵌套级别 with open('text.txt', 'r') as text: matches = re.findall('([^{}

嘿,我需要创建简单的python随机化器。输入示例:

{{hey|hello|hi}|{privet|zdravstvuy|kak dela}|{bonjour|salut}}, can {you|u} give me advice?
输出应为:

hello, can you give me advice
我有一个脚本,它可以做到这一点,但只能在一个嵌套级别

with open('text.txt', 'r') as text:
    matches = re.findall('([^{}]+)', text.read())
words = []
for match in matches:
    parts = match.split('|')
    if parts[0]:
        words.append(parts[random.randint(0, len(parts)-1)])
message = ''.join(words)

这对我来说还不够)

Python正则表达式不支持嵌套结构,因此您必须找到其他方法来解析字符串

以下是我的简短总结:

def randomize(text):
    start= text.find('{')
    if start==-1: #if there are no curly braces, there's nothing to randomize
        return text

    # parse the choices we have
    end= start
    word_start= start+1
    nesting_level= 0
    choices= [] # list of |-separated values
    while True:
        end+= 1
        try:
            char= text[end]
        except IndexError:
            break # if there's no matching closing brace, we'll pretend there is.
        if char=='{':
            nesting_level+= 1
        elif char=='}':
            if nesting_level==0: # matching closing brace found - stop parsing.
                break
            nesting_level-= 1
        elif char=='|' and nesting_level==0:
            # put all text up to this pipe into the list
            choices.append(text[word_start:end])
            word_start= end+1
    # there's no pipe character after the last choice, so we have to add it to the list now
    choices.append(text[word_start:end])
    # recursively call this function on each choice
    choices= [randomize(t) for t in choices]
    # return the text up to the opening brace, a randomly chosen string, and
    # don't forget to randomize the text after the closing brace 
    return text[:start] + random.choice(choices) + randomize(text[end+1:])

如上所述,嵌套在这里基本上是无用的,但如果您想保持当前语法,处理它的一种方法是替换循环中的大括号,直到不再有:

import re, random

msg = '{{hey|hello|hi}|{privet|zdravstvuy|kak dela}|{bonjour|salut}}, can {you|u} give me advice?'


while re.search(r'{.*}', msg):
    msg = re.sub(
        r'{([^{}]*)}', 
        lambda m: random.choice(m.group(1).split('|')), 
        msg)

print msg
# zdravstvuy, can u give me advice?

在我看来,您的输入遵循的语法对于简单的正则表达式来说有点太复杂了。我想说,构建一个合适的词法分析器,由解析器调用它来生成输出。如果您不熟悉这个概念,我建议您首先阅读一下这个理论:)您正在寻找递归正则表达式匹配。看:@KarelKubat哦,不,我不需要这个。我只想从包含另一个剪切大括号的大括号中获取随机文本,如果在{{a{c{d}}{e}}{f}中有更多的嵌套,你会怎么做?因为只有一个操作符,你不需要额外的大括号
{a |{b | c}
基本上与
{a | b | c}
相同。