Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将字符串拆分为一个列表,保留重音字符和表情符号,但删除标点符号_Python_String_Split_Emoticons - Fatal编程技术网

Python 将字符串拆分为一个列表,保留重音字符和表情符号,但删除标点符号

Python 将字符串拆分为一个列表,保留重音字符和表情符号,但删除标点符号,python,string,split,emoticons,Python,String,Split,Emoticons,如果我有字符串: "O João foi almoçar :) ." 我如何最好地将其拆分为python中的单词列表,如下所示: ['O','João', 'foi', 'almoçar', ':)'] ? 谢谢:) Sofia如果标点符号像您的示例一样落入其自身的空格分隔标记中,那么很容易: >>> filter(lambda s: s not in string.punctuation, "O João foi almoçar :) .".split()) ['O',

如果我有字符串:

"O João foi almoçar :) ." 
我如何最好地将其拆分为python中的单词列表,如下所示:

['O','João', 'foi', 'almoçar', ':)']
?

谢谢:)


Sofia

如果标点符号像您的示例一样落入其自身的空格分隔标记中,那么很容易:

>>> filter(lambda s: s not in string.punctuation, "O João foi almoçar :) .".split())
['O', 'Jo\xc3\xa3o', 'foi', 'almo\xc3\xa7ar', ':)']
如果不是这样,您可以像这样定义笑脸词典(您需要添加更多):

这让我们看到了“奥若昂·福伊·阿尔莫萨。”

然后我们去掉标点符号:

for smiley, placeholder in d.iteritems():
    s = s.replace(smiley, placeholder)
s = ''.join(filter(lambda c: c not in '.,!', list(s)))
这就给了我们一个“奥若昂·福伊·阿尔摩萨”

我们会回复笑脸:

for smiley, placeholder in d.iteritems():
    s = s.replace(placeholder, smiley)
然后我们将其拆分:

s = s.split()
给出我们的最终结果:
['O','Jo\xc3\xa3o','foi','almo\xc3\xa7ar',':)']

将其全部组合成一个函数:

def split_special(s):
    d = { ':)': '<HAPPY_SMILEY>', ':(': '<SAD_SMILEY>'}
    for smiley, placeholder in d.iteritems():
        s = s.replace(smiley, placeholder)
    s = ''.join(filter(lambda c: c not in '.,!', list(s)))
    for smiley, placeholder in d.iteritems():
        s = s.replace(placeholder, smiley)
    return s.split()
def split_特殊:
d={':)':'',:(':''}
对于smiley,d.iteritems()中的占位符:
s=s.replace(笑脸,占位符)
s=''.join(筛选器(lambda c:c不在'.,!'列表中))
对于smiley,d.iteritems()中的占位符:
s=s.replace(占位符,笑脸)
返回s.split()

如何区分标点符号和表情符号?['O','Jo\xc6o','foi','almo\x87ar',':)']
def split_special(s):
    d = { ':)': '<HAPPY_SMILEY>', ':(': '<SAD_SMILEY>'}
    for smiley, placeholder in d.iteritems():
        s = s.replace(smiley, placeholder)
    s = ''.join(filter(lambda c: c not in '.,!', list(s)))
    for smiley, placeholder in d.iteritems():
        s = s.replace(placeholder, smiley)
    return s.split()
>>> import string
>>> [ i for i in s.split(' ') if i not in string.punctuation]
['O', 'João', 'foi', 'almoçar', ':)']