Python 将表情符号替换为“符号”的代码;“悲伤”;或;“快乐”;工作不正常
所以我想用“快乐”替换所有的快乐表情,反之亦然,用“悲伤”替换文本文件中的悲伤表情。但是代码工作不正常。虽然它检测到笑脸(截至目前:-),但在下面的示例中,它并没有用文本替换表情符号,它只是简单地附加文本,而且由于我似乎不理解的原因,它也附加了两次Python 将表情符号替换为“符号”的代码;“悲伤”;或;“快乐”;工作不正常,python,nltk,text-processing,Python,Nltk,Text Processing,所以我想用“快乐”替换所有的快乐表情,反之亦然,用“悲伤”替换文本文件中的悲伤表情。但是代码工作不正常。虽然它检测到笑脸(截至目前:-),但在下面的示例中,它并没有用文本替换表情符号,它只是简单地附加文本,而且由于我似乎不理解的原因,它也附加了两次 dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD", ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"} dict_happy={":-)":"HAPPY",":)":"HA
dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD", ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"}
dict_happy={":-)":"HAPPY",":)":"HAPPY", ":o)":"HAPPY",":-}":"HAPPY",";-}":"HAPPY",":->":"HAPPY",";-)":"HAPPY"}
#THE INPUT TEXT#
a="guys beautifully done :-)"
for i in a.split():
for j in dict_happy.keys():
if set(j).issubset(set(i)):
print "HAPPY"
continue
for k in dict_sad.keys():
if set(k).issubset(set(i)):
print "SAD"
continue
if str(i)==i.decode('utf-8','replace'):
print i
输出(“快乐”出现两次,表情符号也没有消失)
预期产量
guys
beautifully
done
HAPPY
你把每个单词和每个表情都变成一个集合;这意味着您要查找单个字符的重叠。您最多可能希望使用精确匹配:
for i in a.split():
for j in dict_happy:
if j == i:
print "HAPPY"
continue
for k in dict_sad:
if k == i:
print "SAD"
continue
您可以直接迭代字典,无需在那里调用.keys()
。实际上,您似乎没有使用字典中的值;你可以这样做:
for word in a.split():
if word in dict_happy:
print "HAPPY"
if word in dict_sad:
print "SAD"
然后可能使用集合而不是字典。然后,这可以简化为:
words = set(a.split())
if dict_happy.viewkeys() & words:
print "HAPPY"
if dict_sad.viewkeys() & words:
print "SAD"
将按键上的键作为一组使用。尽管如此,还是最好使用集合:
sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}
words = set(a.split())
if sad_emoticons & words:
print "HAPPY"
if happy_emoticons & words:
print "SAD"
或者更好的方法是,将这两个词典合并使用dict.get()
:
表情符号={
“:-”:“:”悲伤“,”:(“:”悲伤“,”:-|“:”悲伤“,
“;-”(“:”悲伤“,”;-“:”快乐“,
“;-):“快乐”
}
对于a.split()中的单词:
打印表情符号。获取(单词,单词)
在这里,我传入当前单词作为查找键和默认值;如果当前单词不是表情符号,则打印单词本身,否则打印单词
SAD
或HAPPY
。而不是我使用的字典列表。使代码更简单:
list_sad = [":(", ":-("]
list_happy = [":)", ":-)"]
a = "guys beautifully done :-)"
for i in a.split():
if i in list_sad:
print ("SAD")
elif i in list_happy:
print ("HAPPY")
else:
print (i)
为什么要在这里使用集合?您正在创建字符集,而
集合(':)
都是集合(':-)
的一个子集,所以它们的匹配就像集合('):')
一样。为什么-
之后是guys???在预期输出中???@Hackaholic感谢您的指出,编辑Hanks Martijn。但是,如果我在代码中附加“if str(I)=I.decode('utf-8','replace'):print I”,它就不会逃逸表情符号。虽然“快乐”已经印过一次了,但表情符号并没有消失。输出:“伙计们-做得漂亮快乐:-”)@rzach:表情符号只是ASCII文本。将其编码为UTF8不会使其消失,不。但是我如何打印句子,目的是打印句子,并用快乐/悲伤替换表情符号。另外:::AttributeError:“dict_keys”对象没有属性“intersection”。@Hackaholic:如果您只想根据一个句子打印HAPPY
或SAD
,那么使用集合进行简单的交集更有效。@Hackaholic:OP想要的是混乱和不清楚的。因此,我介绍了许多可能的基础;列表上的成员资格测试采用线性扫描,集合使用哈希(类似于字典)提供恒定的查找时间。
sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}
words = set(a.split())
if sad_emoticons & words:
print "HAPPY"
if happy_emoticons & words:
print "SAD"
for word in a.split():
if word in dict_happy:
print "HAPPY"
elif word in dict_sad:
print "SAD"
else:
print word
emoticons = {
":-(": "SAD", ":(": "SAD", ":-|": "SAD",
";-(": "SAD", ";-<": "SAD", "|-{": "SAD",
":-)": "HAPPY",":)": "HAPPY", ":o)": "HAPPY",
":-}": "HAPPY", ";-}": "HAPPY", ":->": "HAPPY",
";-)": "HAPPY"
}
for word in a.split():
print emoticons.get(word, word)
list_sad = [":(", ":-("]
list_happy = [":)", ":-)"]
a = "guys beautifully done :-)"
for i in a.split():
if i in list_sad:
print ("SAD")
elif i in list_happy:
print ("HAPPY")
else:
print (i)