如何替换python中一句话中出现一次的单词

如何替换python中一句话中出现一次的单词,python,Python,我想用'替换句子中出现过一次的单词。比如一句话:hello-hello-world-my-world,我希望输出是hello-hello-world,怎么做 现在我是这样做的: wordlist1 = trainfiles.split(None) wordlist2 = [] for word1 in wordlist1: lastchar = word1[-1:] if lastchar in [",",".",

我想用
'
替换句子中出现过一次的单词。比如一句话:
hello-hello-world-my-world
,我希望输出是
hello-hello-world
,怎么做

现在我是这样做的:

 wordlist1 = trainfiles.split(None)
        wordlist2 = []
        for word1 in wordlist1:
            lastchar = word1[-1:]
            if lastchar in [",",".","!","?",";"]:
                word2 = word1.rstrip(lastchar)
            else:
                word2 = word1
            wordlist2.append(word2)
        freq = {}
        for word2 in wordlist2:
            freq[word2] = freq.get(word2,0)+1
        keylist = freq.keys()
        keylist.sort()

    for key2 in keylist:
        if freq[key2] == 1:
            print "%-10s %d" % ('<unk>', freq[key2])
        else:
            print "%-10s %d" % (key2, freq[key2])
wordlist1=trainfiles.split(无)
wordlist2=[]
对于wordlist1中的word1:
lastchar=word1[-1:]
如果[“,”,“!”,“?”,“;”]中的lastchar:
word2=word1.rstrip(lastchar)
其他:
word2=word1
wordlist2.append(word2)
频率={}
对于wordlist2中的word2:
freq[word2]=freq.get(word2,0)+1
keylist=频率键()
keylist.sort()
对于keylist中的key2:
如果freq[key2]==1:
打印“%-10s%d”(“”,频率[key2])
其他:
打印“%-10s%d”%(键2,频率[键2])
这给了我一个输出,比如:

hello   2
<unk>   1
world   2
hello 2
1.
世界2
但是,我需要这样的输出:

hello hello world <unk> world
你好,世界你好

如何做到这一点?

使用
集合。计数器
计算句子中单词的频率

from collections import Counter
s = 'hello hello world my world'
counts = Counter(s.split())
然后使用生成器表达式替换计数为1的任何单词,并将结果与空格字符联接

replaced = ' '.join(i if counts[i] > 1 else '<unk>' for i in s.split())
replaced=''.join(如果s.split()中的i计数为[i]>1 else'')
结果

'hello hello world <unk> world'
“世界你好”

正如@Cyber所指出的,关键在于使用
集合。计数器。此版本保留原始行的标点符号和空格

import re
from collections import Counter
trainfiles = 'hello hello, world my world!'

wordlist = re.findall(r'\b\w+\b', trainfiles)
wordlist = Counter(wordlist)
for word, count in wordlist.items():
    if count == 1:
        trainfiles = re.sub(r'\b{}\b'.format(word), '<unk>', trainfiles)

print trainfiles
重新导入
从收款进口柜台
列车文件='你好,世界,我的世界!'
wordlist=re.findall(r'\b\w+\b',trainfile)
wordlist=计数器(wordlist)
对于word,在wordlist.items()中计数:
如果计数=1:
trainfiles=re.sub(r'\b{}\b'.格式(word),'',trainfiles)
打印列车文件

连接字符串并将其打印一次?为什么导入语句中的集合语法无效?您可能应该删除标点以获得准确的计数
“hello hello.”.split().count(“hello”)=1
看起来OP在到达计数逻辑时已经删除了标点,但我可能弄错了。