Python 将表情符号替换为“符号”的代码;“悲伤”;或;“快乐”;工作不正常

Python 将表情符号替换为“符号”的代码;“悲伤”;或;“快乐”;工作不正常,python,nltk,text-processing,Python,Nltk,Text Processing,所以我想用“快乐”替换所有的快乐表情,反之亦然,用“悲伤”替换文本文件中的悲伤表情。但是代码工作不正常。虽然它检测到笑脸(截至目前:-),但在下面的示例中,它并没有用文本替换表情符号,它只是简单地附加文本,而且由于我似乎不理解的原因,它也附加了两次 dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD", ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"} dict_happy={":-)":"HAPPY",":)":"HA

所以我想用“快乐”替换所有的快乐表情,反之亦然,用“悲伤”替换文本文件中的悲伤表情。但是代码工作不正常。虽然它检测到笑脸(截至目前:-),但在下面的示例中,它并没有用文本替换表情符号,它只是简单地附加文本,而且由于我似乎不理解的原因,它也附加了两次

dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD",  ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"}
dict_happy={":-)":"HAPPY",":)":"HAPPY", ":o)":"HAPPY",":-}":"HAPPY",";-}":"HAPPY",":->":"HAPPY",";-)":"HAPPY"}

#THE INPUT TEXT#
a="guys beautifully done :-)" 

for i in a.split():
    for j in dict_happy.keys():
        if set(j).issubset(set(i)):
            print "HAPPY"
            continue
    for k in dict_sad.keys():
        if set(k).issubset(set(i)):
            print "SAD"
            continue
    if str(i)==i.decode('utf-8','replace'):
       print i
输出(“快乐”出现两次,表情符号也没有消失)

预期产量

guys
beautifully
done
HAPPY
你把每个单词和每个表情都变成一个集合;这意味着您要查找单个字符的重叠。您最多可能希望使用精确匹配:

for i in a.split():
    for j in dict_happy:
        if j == i:
            print "HAPPY"
            continue
    for k in dict_sad:
        if k == i:
            print "SAD"
            continue
您可以直接迭代字典,无需在那里调用
.keys()
。实际上,您似乎没有使用字典中的值;你可以这样做:

for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    if word in dict_sad:
        print "SAD"
然后可能使用集合而不是字典。然后,这可以简化为:

words = set(a.split())
if dict_happy.viewkeys() & words:
    print "HAPPY"
if dict_sad.viewkeys() & words:
    print "SAD"
将按键上的键作为一组使用。尽管如此,还是最好使用集合:

sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}

words = set(a.split())
if sad_emoticons & words:
    print "HAPPY"
if happy_emoticons & words:
    print "SAD"
或者更好的方法是,将这两个词典合并使用
dict.get()

表情符号={
“:-”:“:”悲伤“,”:(“:”悲伤“,”:-|“:”悲伤“,
“;-”(“:”悲伤“,”;-“:”快乐“,
“;-):“快乐”
}
对于a.split()中的单词:
打印表情符号。获取(单词,单词)

在这里,我传入当前单词作为查找键和默认值;如果当前单词不是表情符号,则打印单词本身,否则打印单词
SAD
HAPPY

而不是我使用的字典列表。使代码更简单:

list_sad = [":(", ":-("]
list_happy = [":)", ":-)"]

a = "guys beautifully done :-)" 

for i in a.split():
    if i in list_sad:
        print ("SAD")
    elif i in list_happy:
        print ("HAPPY")
    else:
        print (i)

为什么要在这里使用集合?您正在创建字符集,而
集合(':)
都是
集合(':-)
的一个子集,所以它们的匹配就像
集合('):')
一样。为什么
-
之后是guys???在预期输出中???@Hackaholic感谢您的指出,编辑Hanks Martijn。但是,如果我在代码中附加“if str(I)=I.decode('utf-8','replace'):print I”,它就不会逃逸表情符号。虽然“快乐”已经印过一次了,但表情符号并没有消失。输出:“伙计们-做得漂亮快乐:-”)@rzach:表情符号只是ASCII文本。将其编码为UTF8不会使其消失,不。但是我如何打印句子,目的是打印句子,并用快乐/悲伤替换表情符号。另外:::AttributeError:“dict_keys”对象没有属性“intersection”。@Hackaholic:如果您只想根据一个句子打印
HAPPY
SAD
,那么使用集合进行简单的交集更有效。@Hackaholic:OP想要的是混乱和不清楚的。因此,我介绍了许多可能的基础;列表上的成员资格测试采用线性扫描,集合使用哈希(类似于字典)提供恒定的查找时间。
sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}

words = set(a.split())
if sad_emoticons & words:
    print "HAPPY"
if happy_emoticons & words:
    print "SAD"
for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    elif word in dict_sad:
        print "SAD"
    else:
        print word
emoticons = {
    ":-(": "SAD", ":(": "SAD", ":-|": "SAD", 
    ";-(": "SAD", ";-<": "SAD", "|-{": "SAD",
    ":-)": "HAPPY",":)": "HAPPY", ":o)": "HAPPY",
    ":-}": "HAPPY", ";-}": "HAPPY", ":->": "HAPPY",
    ";-)": "HAPPY"
}

for word in a.split():
    print emoticons.get(word, word)
list_sad = [":(", ":-("]
list_happy = [":)", ":-)"]

a = "guys beautifully done :-)" 

for i in a.split():
    if i in list_sad:
        print ("SAD")
    elif i in list_happy:
        print ("HAPPY")
    else:
        print (i)