Python 如何从马尔可夫链输出创建段落?

Python 如何从马尔可夫链输出创建段落?,python,markov-chains,Python,Markov Chains,我想修改下面的脚本,这样它就可以从脚本生成的任意数量的句子中创建段落。换言之,在添加换行符之前,连接随机数(如1-5)的句子 脚本按原样运行良好,但输出是由换行符分隔的短句。我想把一些句子整理成段落 关于最佳实践有什么想法吗?谢谢 """ from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python """ import random; import sys;

我想修改下面的脚本,这样它就可以从脚本生成的任意数量的句子中创建段落。换言之,在添加换行符之前,连接随机数(如1-5)的句子

脚本按原样运行良好,但输出是由换行符分隔的短句。我想把一些句子整理成段落

关于最佳实践有什么想法吗?谢谢

"""
    from:  http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""

import random;
import sys;

stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep  = "\n" #String used to seperate sentences


# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}

for line in sys.stdin:
    for word in line.split():
        if word[-1] in stopsentence:
            table.setdefault( (w1, w2), [] ).append(word[0:-1])
            w1, w2 = w2, word[0:-1]
            word = word[-1]
        table.setdefault( (w1, w2), [] ).append(word)
        w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)

# GENERATE SENTENCE OUTPUT
maxsentences  = 20

w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []

while sentencecount < maxsentences:
    newword = random.choice(table[(w1, w2)])
    if newword == stopword: sys.exit()
    if newword in stopsentence:
        print ("%s%s%s" % (" ".join(sentence), newword, sentencesep))
        sentence = []
        sentencecount += 1
    else:
        sentence.append(newword)
    w1, w2 = w2, newword

编辑03:

这是该脚本的最后一次迭代。感谢格里夫帮我解决这件事。我希望其他人能从中得到一些乐趣,我知道我会的

仅供参考:有一个小工件-如果使用此脚本,可能需要清理额外的段落结尾空间。但是,除此之外,马尔可夫链文本生成的完美实现

###
#    usage: python markov_sentences.py < input.txt > output.txt
#    from:  http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
###

import random;
import sys;

stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep  = "\n" #String used to seperate sentences


# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}

for line in sys.stdin:
    for word in line.split():
        if word[-1] in stopsentence:
            table.setdefault( (w1, w2), [] ).append(word[0:-1])
            w1, w2 = w2, word[0:-1]
            word = word[-1]
        table.setdefault( (w1, w2), [] ).append(word)
        w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)

# GENERATE SENTENCE OUTPUT
maxsentences  = 20

w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep = "\n"
count = random.randrange(1,5)

while sentencecount < maxsentences:
    newword = random.choice(table[(w1, w2)]) # random word from word table
    if newword == stopword: sys.exit()
    if newword in stopsentence:
        print ("%s%s" % (" ".join(sentence), newword), end=" ")
        sentence = []
        sentencecount += 1 # increment the sentence counter
        count -= 1
        if count == 0:
            count = random.randrange(1,5)
            print (paragraphsep) # newline space
    else:
        sentence.append(newword)
    w1, w2 = w2, newword


# EOF
###
#用法:python.pyoutput.txt
#发件人:http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-蟒蛇
###
随机输入;
导入系统;
stopword=“\n”#由于我们使用空格分隔,因此这永远不会是一个单词
stop句子=(“,”!“,“?”,)#如果在单词末尾找到一个“新句子”
sentencesep=“\n”#用于分隔句子的字符串
#生成表
w1=停止字
w2=停止字
表={}
对于sys.stdin中的行:
对于第行中的单词。拆分():
如果停止句中的单词[-1]:
table.setdefault((w1,w2),[])。append(单词[0:-1])
w1,w2=w2,字[0:-1]
单词=单词[-1]
表.setdefault((w1,w2),[])。追加(word)
w1,w2=w2,字
#标记文件的结尾
表.setdefault((w1,w2),[])。追加(stopword)
#生成句子输出
最大句子数=20
w1=停止字
w2=停止字
句子计数=0
句子=[]
段落=“\n”
计数=随机。随机范围(1,5)
而sentencecount
您理解此代码吗?我打赌你可以找到打印句子的位元,并将其更改为一起打印几个句子,而不返回。您可以在句子位周围添加另一个while循环,以获得多个段落

语法提示:

print 'hello'
print 'there'
hello
there

print 'hello',
print 'there'
hello there

print 'hello',
print 
print 'there'
关键是,print语句末尾的逗号会阻止行尾的返回,而空白print语句会打印返回。

您需要复制

sentence = [] 
回到过去

elif newword in stopsentence:
条款

所以

而段落
编辑

这是一个不使用外循环的解决方案

"""
    from:  http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""

import random;
import sys;

stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep  = "\n" #String used to seperate sentences


# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}

for line in sys.stdin:
    for word in line.split():
        if word[-1] in stopsentence:
            table.setdefault( (w1, w2), [] ).append(word[0:-1])
            w1, w2 = w2, word[0:-1]
            word = word[-1]
        table.setdefault( (w1, w2), [] ).append(word)
        w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)

# GENERATE SENTENCE OUTPUT
maxsentences  = 20

w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep == "\n\n"
count = random.randrange(1,5)

while sentencecount < maxsentences:
    newword = random.choice(table[(w1, w2)])
    if newword == stopword: sys.exit()
    if newword in stopsentence:
        print ("%s%s" % (" ".join(sentence), newword), end=" ")
        sentence = []
        sentencecount += 1
        count -= 1
        if count == 0:
            count = random.randrange(1,5)
            print (paragraphsep)
    else:
        sentence.append(newword)
    w1, w2 = w2, newword
“”“
发件人:http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-蟒蛇
"""
随机输入;
导入系统;
stopword=“\n”#由于我们使用空格分隔,因此这永远不会是一个单词
stop句子=(“,”!“,“?”,)#如果在单词末尾找到一个“新句子”
sentencesep=“\n”#用于分隔句子的字符串
#生成表
w1=停止字
w2=停止字
表={}
对于sys.stdin中的行:
对于第行中的单词。拆分():
如果停止句中的单词[-1]:
table.setdefault((w1,w2),[])。append(单词[0:-1])
w1,w2=w2,字[0:-1]
单词=单词[-1]
表.setdefault((w1,w2),[])。追加(word)
w1,w2=w2,字
#标记文件的结尾
表.setdefault((w1,w2),[])。追加(stopword)
#生成句子输出
最大句子数=20
w1=停止字
w2=停止字
句子计数=0
句子=[]
段落sep==“\n\n”
计数=随机。随机范围(1,5)
而sentencecount
是的,我明白了。问题是,我用
print
语句尝试的所有方法都无助于将句子组合成段落(除非你计算去掉所有换行符,形成一个巨大的段落)。
while
循环是我想到的,但我不太确定如何包装句子部分。我尝试的每一件事都会导致各种各样的错误,所以我想我应该问问专家。告诉它“生成x(例如1-5个)数量的句子,然后插入一个换行符,然后重复,直到达到
maxstations
为止”的最佳方式是什么
elif newword in stopsentence:
while paragraphs < maxparagraphs: # start outer loop, until maxparagraphs is reached
    w1 = stopword
    w2 = stopword
    stopsentence = (".", "!", "?",)
    sentence = []
    sentencecount = 0 # reset the inner 'while' loop counter to zero
    maxsentences = random.randrange(1,5) # random sentences per paragraph

    while sentencecount < maxsentences: # start inner loop, until maxsentences is reached
        newword = random.choice(table[(w1, w2)]) # random word from word table
        if newword == stopword: sys.exit()
        elif newword in stopsentence:
            print ("%s%s" % (" ".join(sentence), newword), end=" ")
            sentence = [] # I have to be here to make the new sentence start as an empty list!!!
            sentencecount += 1 # increment the sentence counter
        else:
            sentence.append(newword)
        w1, w2 = w2, newword
    print (paragraphsep) # newline space
    paragraphs = paragraphs + 1 # increment the paragraph counter
"""
    from:  http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""

import random;
import sys;

stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep  = "\n" #String used to seperate sentences


# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}

for line in sys.stdin:
    for word in line.split():
        if word[-1] in stopsentence:
            table.setdefault( (w1, w2), [] ).append(word[0:-1])
            w1, w2 = w2, word[0:-1]
            word = word[-1]
        table.setdefault( (w1, w2), [] ).append(word)
        w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)

# GENERATE SENTENCE OUTPUT
maxsentences  = 20

w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep == "\n\n"
count = random.randrange(1,5)

while sentencecount < maxsentences:
    newword = random.choice(table[(w1, w2)])
    if newword == stopword: sys.exit()
    if newword in stopsentence:
        print ("%s%s" % (" ".join(sentence), newword), end=" ")
        sentence = []
        sentencecount += 1
        count -= 1
        if count == 0:
            count = random.randrange(1,5)
            print (paragraphsep)
    else:
        sentence.append(newword)
    w1, w2 = w2, newword