Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python拆分字符串并将分隔符保留为单词_Python_Regex_String_Split - Fatal编程技术网

Python拆分字符串并将分隔符保留为单词

Python拆分字符串并将分隔符保留为单词,python,regex,string,split,Python,Regex,String,Split,我正在尝试使用多个分隔符拆分字符串。我需要将分隔符保留为单词。 我使用的分隔符是:所有标点符号和空格 例如,字符串: Je suis, FOU et toi ?! 应产生: 'Je' 'suis' ',' 'FOU' 'et' 'toi' '?' '!' 我写道: class Parser : def __init__(self) : """Empty constructor""" def read(self, file_name) : fr

我正在尝试使用多个分隔符拆分字符串。我需要将分隔符保留为单词。 我使用的分隔符是:所有标点符号和空格

例如,字符串:

Je suis, FOU et toi ?!
应产生:

'Je'
'suis'
','
'FOU'
'et'
'toi'
'?'
'!'
我写道:

class Parser :
    def __init__(self) :
        """Empty constructor"""

    def read(self, file_name) :
        from string import punctuation
        with open(file_name, 'r') as file :
            for line in file :
                for word in line.split() :
                    r = re.compile(r'[\s{}]+'.format(re.escape(punctuation)))
                    print(r.split(word))
但我得到的结果是:

['Je']
['suis', '']
['FOU']
['et']
['toi']
['', '']

拆分似乎是正确的,但结果列表不包含分隔符:(

您需要将表达式放入一个组中,以便
re.split()
以保留它。我不会先拆分空白;以后您始终可以删除仅限空白的字符串。如果希望每个标点符号字符分开,则应仅在
\s
空白组中使用
+
量词:

# do this just once, not in a loop
pattern = re.compile(r'(\s+|[{}])'.format(re.escape(punctuation)))

# for each line
parts = [part for part in pattern.split(line) if part.strip()]
列表将删除仅包含空格的任何内容:

>>> import re
>>> from string import punctuation
>>> line = 'Je suis, FOU et toi ?!'
>>> pattern = re.compile(r'(\s+|[{}])'.format(re.escape(punctuation)))
>>> pattern.split(line)
['Je', ' ', 'suis', ',', '', ' ', 'FOU', ' ', 'et', ' ', 'toi', ' ', '', '?', '', '!', '']
>>> [part for part in pattern.split(line) if part.strip()]
['Je', 'suis', ',', 'FOU', 'et', 'toi', '?', '!']
>>> pattern = re.compile(r'\w+|[{}]'.format(re.escape(punctuation)))
>>> pattern.findall(line)
['Je', 'suis', ',', 'FOU', 'et', 'toi', '?', '!']
您还可以使用
re.findall()
查找所有单词或标点符号序列,而不是拆分:

pattern = re.compile(r'\w+|[{}]'.format(re.escape(punctuation)))

parts = pattern.findall(line)
这样做的优点是您不需要过滤掉空白:

>>> import re
>>> from string import punctuation
>>> line = 'Je suis, FOU et toi ?!'
>>> pattern = re.compile(r'(\s+|[{}])'.format(re.escape(punctuation)))
>>> pattern.split(line)
['Je', ' ', 'suis', ',', '', ' ', 'FOU', ' ', 'et', ' ', 'toi', ' ', '', '?', '', '!', '']
>>> [part for part in pattern.split(line) if part.strip()]
['Je', 'suis', ',', 'FOU', 'et', 'toi', '?', '!']
>>> pattern = re.compile(r'\w+|[{}]'.format(re.escape(punctuation)))
>>> pattern.findall(line)
['Je', 'suis', ',', 'FOU', 'et', 'toi', '?', '!']