如何使用python中的regex模块将文本字符串拆分为单词？_Python_Regex_Split_Expression

如何使用python中的regex模块将文本字符串拆分为单词？

python regex

如何使用python中的regex模块将文本字符串拆分为单词？,python,regex,split,expression,Python,Regex,Split,Expression,这是我的工作内容 string1 = "Dog,cat,mouse,bird. Human." def string_count(text): text = re.split('\W+', text) count = 0 for x in text: count += 1 print count print x return text print string_count(string1) …这是输出 1 Dog 2

这是我的工作内容

string1 = "Dog,cat,mouse,bird. Human."

def string_count(text):
    text = re.split('\W+', text)
    count = 0
    for x in text:
        count += 1
        print count
        print x

return text

print string_count(string1)

…这是输出

1
Dog
2
cat
3
mouse
4
bird
5
Human
6

['Dog', 'cat', 'mouse', 'bird', 'Human', '']

为什么只有5个单词我却得了6分？我似乎无法摆脱

“

（空字符串）！它快把我逼疯了。

因为当它根据最后一个点进行分割时，它也给出了最后一个空部分

您基于

\W+

拆分了输入字符串，这意味着基于一个或多个非单词字符拆分了输入字符串。因此，正则表达式也匹配最后一个点，并基于最后一个点分割输入。由于到最后一个点后不存在字符串，因此分割后返回空字符串

Avinash Raj正确地说明了它为什么这么做。以下是修复方法：

string1 = "Dog,cat,mouse,bird. Human."
the_list = [word for word in re.split('\W+', string1) if word]
# include the word in the list if it's not the empty string

或者（这是更好的…）

在问题编辑器中，拖动并选择代码，然后按文本编辑器上方的

{}

按钮。这是正确答案。字符串以与

\W+

匹配的字符结尾，即句点（

）。这意味着字符串末尾有一个额外的空字段。如果字符串以非单词字符开头，那么开头也会有一个空字段。

string1 = "Dog,cat,mouse,bird. Human."
the_list = re.findall('\w+', string1)
# find all words in string1