如何替换python中一句话中出现一次的单词_Python

如何替换python中一句话中出现一次的单词

python

如何替换python中一句话中出现一次的单词,python,Python,我想用'替换句子中出现过一次的单词。比如一句话：hello-hello-world-my-world，我希望输出是hello-hello-world，怎么做现在我是这样做的： wordlist1 = trainfiles.split(None) wordlist2 = [] for word1 in wordlist1: lastchar = word1[-1:] if lastchar in [",",".",

我想用

替换句子中出现过一次的单词。比如一句话：

hello-hello-world-my-world

，我希望输出是

hello-hello-world

，怎么做

现在我是这样做的：

 wordlist1 = trainfiles.split(None)
        wordlist2 = []
        for word1 in wordlist1:
            lastchar = word1[-1:]
            if lastchar in [",",".","!","?",";"]:
                word2 = word1.rstrip(lastchar)
            else:
                word2 = word1
            wordlist2.append(word2)
        freq = {}
        for word2 in wordlist2:
            freq[word2] = freq.get(word2,0)+1
        keylist = freq.keys()
        keylist.sort()

    for key2 in keylist:
        if freq[key2] == 1:
            print "%-10s %d" % ('<unk>', freq[key2])
        else:
            print "%-10s %d" % (key2, freq[key2])

wordlist1=trainfiles.split（无）
wordlist2=[]
对于wordlist1中的word1：
lastchar=word1[-1:]
如果[“，”，“！”，“？”，“；”]中的lastchar：
word2=word1.rstrip（lastchar）
其他：
word2=word1
wordlist2.append（word2）
频率={}
对于wordlist2中的word2：
freq[word2]=freq.get（word2,0）+1
keylist=频率键（）
keylist.sort（）
对于keylist中的key2：
如果freq[key2]==1：
打印“%-10s%d”（“”，频率[key2]）
其他：
打印“%-10s%d”%（键2，频率[键2]）

这给了我一个输出，比如：

hello   2
<unk>   1
world   2

hello 2
1.
世界2

但是，我需要这样的输出：

hello hello world <unk> world

你好，世界你好

如何做到这一点？

使用

集合。计数器

计算句子中单词的频率

from collections import Counter
s = 'hello hello world my world'
counts = Counter(s.split())

然后使用生成器表达式替换计数为1的任何单词，并将结果与空格字符联接

replaced = ' '.join(i if counts[i] > 1 else '<unk>' for i in s.split())

replaced=''.join（如果s.split（）中的i计数为[i]>1 else''）

结果

'hello hello world <unk> world'

“世界你好”

正如@Cyber所指出的，关键在于使用

集合。计数器。此版本保留原始行的标点符号和空格
import re
from collections import Counter
trainfiles = 'hello hello, world my world!'

wordlist = re.findall(r'\b\w+\b', trainfiles)
wordlist = Counter(wordlist)
for word, count in wordlist.items():
    if count == 1:
        trainfiles = re.sub(r'\b{}\b'.format(word), '<unk>', trainfiles)

print trainfiles

重新导入
从收款进口柜台
列车文件='你好，世界，我的世界！'
wordlist=re.findall（r'\b\w+\b'，trainfile）
wordlist=计数器（wordlist）
对于word，在wordlist.items（）中计数：
如果计数=1：
trainfiles=re.sub（r'\b{}\b'.格式（word），''，trainfiles）
打印列车文件
连接字符串并将其打印一次？为什么导入语句中的集合语法无效？您可能应该删除标点以获得准确的计数“hello hello.”.split（）.count（“hello”）=1
看起来OP在到达计数逻辑时已经删除了标点，但我可能弄错了。