如何替换python中一句话中出现一次的单词
我想用如何替换python中一句话中出现一次的单词,python,Python,我想用'替换句子中出现过一次的单词。比如一句话:hello-hello-world-my-world,我希望输出是hello-hello-world,怎么做 现在我是这样做的: wordlist1 = trainfiles.split(None) wordlist2 = [] for word1 in wordlist1: lastchar = word1[-1:] if lastchar in [",",".",
'
替换句子中出现过一次的单词。比如一句话:hello-hello-world-my-world
,我希望输出是hello-hello-world
,怎么做
现在我是这样做的:
wordlist1 = trainfiles.split(None)
wordlist2 = []
for word1 in wordlist1:
lastchar = word1[-1:]
if lastchar in [",",".","!","?",";"]:
word2 = word1.rstrip(lastchar)
else:
word2 = word1
wordlist2.append(word2)
freq = {}
for word2 in wordlist2:
freq[word2] = freq.get(word2,0)+1
keylist = freq.keys()
keylist.sort()
for key2 in keylist:
if freq[key2] == 1:
print "%-10s %d" % ('<unk>', freq[key2])
else:
print "%-10s %d" % (key2, freq[key2])
wordlist1=trainfiles.split(无)
wordlist2=[]
对于wordlist1中的word1:
lastchar=word1[-1:]
如果[“,”,“!”,“?”,“;”]中的lastchar:
word2=word1.rstrip(lastchar)
其他:
word2=word1
wordlist2.append(word2)
频率={}
对于wordlist2中的word2:
freq[word2]=freq.get(word2,0)+1
keylist=频率键()
keylist.sort()
对于keylist中的key2:
如果freq[key2]==1:
打印“%-10s%d”(“”,频率[key2])
其他:
打印“%-10s%d”%(键2,频率[键2])
这给了我一个输出,比如:
hello 2
<unk> 1
world 2
hello 2
1.
世界2
但是,我需要这样的输出:
hello hello world <unk> world
你好,世界你好
如何做到这一点?使用
集合。计数器
计算句子中单词的频率
from collections import Counter
s = 'hello hello world my world'
counts = Counter(s.split())
然后使用生成器表达式替换计数为1的任何单词,并将结果与空格字符联接
replaced = ' '.join(i if counts[i] > 1 else '<unk>' for i in s.split())
replaced=''.join(如果s.split()中的i计数为[i]>1 else'')
结果
'hello hello world <unk> world'
“世界你好”
正如@Cyber所指出的,关键在于使用集合。计数器。此版本保留原始行的标点符号和空格
import re
from collections import Counter
trainfiles = 'hello hello, world my world!'
wordlist = re.findall(r'\b\w+\b', trainfiles)
wordlist = Counter(wordlist)
for word, count in wordlist.items():
if count == 1:
trainfiles = re.sub(r'\b{}\b'.format(word), '<unk>', trainfiles)
print trainfiles
重新导入
从收款进口柜台
列车文件='你好,世界,我的世界!'
wordlist=re.findall(r'\b\w+\b',trainfile)
wordlist=计数器(wordlist)
对于word,在wordlist.items()中计数:
如果计数=1:
trainfiles=re.sub(r'\b{}\b'.格式(word),'',trainfiles)
打印列车文件
连接字符串并将其打印一次?为什么导入语句中的集合语法无效?您可能应该删除标点以获得准确的计数“hello hello.”.split().count(“hello”)=1
看起来OP在到达计数逻辑时已经删除了标点,但我可能弄错了。