Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Shmython-“;“是”;有那么难吗?_Python_Regex_Nlp - Fatal编程技术网

Python Shmython-“;“是”;有那么难吗?

Python Shmython-“;“是”;有那么难吗?,python,regex,nlp,Python,Regex,Nlp,我已经写了一个程序来实现 规则基本上是,如果一个单词以一个辅音(或一组辅音)开头,那么你去掉它并添加“shm”,但如果它以一个元音开头,那么你只添加“shm”。你还把整个事情放在现有单词的末尾 问题是字母Y,因为有时是辅音,有时是元音。我想you变成you-shmou,但我想Python变成Python-Shmython。我该怎么办 这是到目前为止我的代码 import re def word_shmord(word): orig = word if word.isupper(

我已经写了一个程序来实现

规则基本上是,如果一个单词以一个辅音(或一组辅音)开头,那么你去掉它并添加“shm”,但如果它以一个元音开头,那么你只添加“shm”。你还把整个事情放在现有单词的末尾

问题是字母Y,因为有时是辅音,有时是元音。我想
you
变成
you-shmou
,但我想
Python
变成
Python-Shmython
。我该怎么办

这是到目前为止我的代码

import re

def word_shmord(word):
    orig = word
    if word.isupper():
        prefix = "SHM"
    elif word.istitle():
        word = word.lower()
        prefix = "Shm"
    else:
        prefix = "shm"
    position = re.search("[aeiou]", word, re.IGNORECASE).start()
    new = prefix + word[position:]
    return "{}-{}".format(orig, new)


text = """
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
"""
text_shmext = re.sub("\w+", lambda m:word_shmord(m.group(0)), text)
print(text_shmext)

我觉得这个问题很有趣,所以我为这个问题编写了一些语言规则(或者我应该说shmoblem)

重新导入
导入字符串
从nltk.corpus导入停止词
从nltk.tokenize导入单词\u tokenize
从nltk.tokenize.sonority\u排序导入音节词典
stop=stopwords.words('english')
tk=音节识别器()
def word_shmord(word):
如果(len(word)<4且word.lower()在stop中)或不是word.isalnum()或word.lower().startswith('shm'):
回信
如果word中的“y”:
y=word.find('y')
#如果单词没有其他元音,那么Y被认为是元音
如果len(re.findall(“[aeiou]”,word,re.IGNORECASE))=0,word.count('y')=1:
word=word[:y]+'#'+word[y+1:]
#或者如果字母在一个单词的末尾
如果单词[-1]=“y”:
单词=单词[:-1]+'#'
#或音节的中间/结尾
if word.find('y')!=-1:
syll=tk.tokenize(word)
对于枚举中的i,s(syll):
snew=s[:-1]+'#'如果s[-1]=='y'其他s
y=snew.find('y')
如果len(snew)//2==y:
snew=snew[:y]+'#'+snew[y+1:]
syll[i]=snew
word=''.join(syll)
如果word.isupper():
前缀=“SHM”
elif word.istitle():
word=word.lower()
前缀=“Shm”
其他:
前缀=“shm”
元音=re.search(“[aeiou#]”,单词,re.IGNORECASE)
如果不是元音:
回信
位置=元音。开始()
new=前缀+单词[position:]替换('#','y')
还新
text=“敏捷的棕色狐狸跳过懒惰的狗”
text_shmext=([word_shmord(x)表示word_标记化(text)])
#连接字符串
text_-shmext=“”.join([“”+i如果我不在字符串中。标点符号否则我在text_-shmext中代表i]).strip()
打印(文本\u shmext)
输入:敏捷的棕色狐狸跳过懒惰的狗


输出:shmuick shmown shmox shmumps shmover shmazy shmog

不是真的。我想要
Skye Shmye
。也许只有单词开头的辅音才有效。但是我们关心y在后院的地位吗?它只是去了shmackyard,
import re
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.tokenize.sonority_sequencing import SyllableTokenizer

stop = stopwords.words('english')
tk = SyllableTokenizer()


def word_shmord(word):
    if (len(word) < 4 and word.lower() in stop) or not word.isalnum() or word.lower().startswith('shm'):
        return word
    if 'y' in word:
        y = word.find('y')
        # Y is considered to be a vowel if The word has no other vowel
        if len(re.findall("[aeiou]", word, re.IGNORECASE)) == 0 and word.count('y') == 1:
            word = word[:y] + '#' + word[y + 1:]
        # or if the letter is at the end of a word
        if word[-1] == 'y':
            word = word[:-1]+ '#'
        # or middle/end of syllable
        if word.find('y') != -1:
            syll = tk.tokenize(word)
            for i, s in enumerate(syll):
                snew = s[:-1] + '#' if s[-1] == 'y' else s
                y = snew.find('y')
                if len(snew) // 2 == y:
                    snew = snew[:y] + '#' + snew[y + 1:]
                syll[i] = snew
            word = ''.join(syll)

    if word.isupper():
        prefix = "SHM"
    elif word.istitle():
        word = word.lower()
        prefix = "Shm"
    else:
        prefix = "shm"
    vowels = re.search("[aeiou#]", word, re.IGNORECASE)
    if not vowels:
        return word
    position = vowels.start()
    new = prefix + word[position:].replace('#', 'y')
    return new


text = "The quick brown fox jumps over the lazy dog"
text_shmext = ([word_shmord(x) for x in word_tokenize(text)])
# join strings
text_shmext = "".join([" " + i if i not in string.punctuation else i for i in text_shmext]).strip()
print(text_shmext)