Python TypeError:无法在类似字符串的对象上使用字节模式_Python_Regex_Python 3.x_Tokenize

Python TypeError:无法在类似字符串的对象上使用字节模式

python regex python-3.x

Python TypeError:无法在类似字符串的对象上使用字节模式,python,regex,python-3.x,tokenize,Python,Regex,Python 3.x,Tokenize,我试图把一个句子标记成单词。在下面的代码中，我尝试使用一些预定义的拆分参数将句子拆分为单词 import re _WORD_SPLIT = re.compile(b"([.,!?\"':;)(])") def basic_tokenizer(sentence): words = [] for space_separated_fragment in sentence.strip().split(): words.extend(_WORD_SPLIT.split(s

我试图把一个句子标记成单词。在下面的代码中，我尝试使用一些预定义的拆分参数将句子拆分为单词

import re
_WORD_SPLIT = re.compile(b"([.,!?\"':;)(])")

def basic_tokenizer(sentence):
    words = []
    for space_separated_fragment in sentence.strip().split():
        words.extend(_WORD_SPLIT.split(space_separated_fragment))
    return [w for w in words if w]

basic_tokenizer("I live, in Mumbai.")

它向我显示了一个错误：

TypeError:无法在类似字符串的对象上使用字节模式

早些时候，这段代码对我来说运行正常，但当我重新安装并安装

tensorflow

时，它显示了一个错误。我还使用了

.decode（）

函数，但它并没有解决我的问题

我在Ubuntu上使用python3.6。

在编译

re

时，您给了一个byte对象，在调用它时，您给了一个string对象

空格\u separated\u片段

将其转换为字节，同时将其传递给

\u WORD\u SPLIT

：

import re
_WORD_SPLIT = re.compile(b"([.,!?\"':;)(])")

def basic_tokenizer(sentence):
    words = []
    for space_separated_fragment in sentence.strip().split():
        words.extend(_WORD_SPLIT.split(space_separated_fragment.encode()))
    return [w for w in words if w]

basic_tokenizer("I live, in Mumbai.")

在编译

re

时，您给出了一个byte对象，在调用它时，您给出了一个string对象

space\u separated\u fragment

将其转换为字节，同时将其传递给

\u WORD\u SPLIT

：

import re
_WORD_SPLIT = re.compile(b"([.,!?\"':;)(])")

def basic_tokenizer(sentence):
    words = []
    for space_separated_fragment in sentence.strip().split():
        words.extend(_WORD_SPLIT.split(space_separated_fragment.encode()))
    return [w for w in words if w]

basic_tokenizer("I live, in Mumbai.")

re.compile

采用普通字符串

re.compile

采用普通字符串

b“（[，！？\”：；）（]）”

r“（[，！？\”：；）（]）”

检查我下面的答案，看看它是否对你有帮助<代码>b“（[，！？\”：；）（]）”->

r“（[，！？\”：；）（]）”

检查我下面的答案，看看它是否对您有帮助！请注意，这将输出一个字节字符串列表：

[b'I'，b'live'，b'，，b'in'，b'Mumbai'，b'.]

请注意，这将输出一个字节字符串列表：

[b'I'，b'live'，b'，，b'in'，b'Mumbai'，b'.]