如何在Python中使多个字符成为一个字符？_Python_String

如何在Python中使多个字符成为一个字符？

python string

如何在Python中使多个字符成为一个字符？,python,string,Python,String,我有以下一句话： sentence_1 = "online auto body" 我在开头和结尾添加了以下字符，表示开头和结尾，因此我的句子如下： sentence = "<s> online auto body <s>" ('<s>','o','n') ('o', 'n', 'l') ('n', 'l', 'i') ('l', 'i', 'n') ('i', 'n', 'e') ('a', 'u', 't') ('u', 't', 'o') ('b'

我有以下一句话：

sentence_1 = "online auto body"

我在开头和结尾添加了以下字符

，表示开头和结尾，因此我的句子如下：

sentence = "<s> online auto body <s>"

('<s>','o','n')
('o', 'n', 'l')
('n', 'l', 'i')
('l', 'i', 'n')
('i', 'n', 'e')
('a', 'u', 't')
('u', 't', 'o')
('b', 'o', 'd')
('o', 'd', 'y')
('d', 'y', '<s>)

我试图这样做的是以下代码：

from nltk import ngrams
n = 3
word_3grams = ngrams(sentence.split(), n)


for w_grams in word_3grams:
    w_gram = list(w_grams)
    print(w_grams[0])
    for i in range(0,n):
        letter_3grams = ngrams(w_grams[i],3)
        for l_gram in letter_3grams:
            print(l_gram)

但我得到的是：

('<', 's', '>')
('o', 'n', 'l')
('n', 'l', 'i')
('l', 'i', 'n')
('i', 'n', 'e')
('a', 'u', 't')
('u', 't', 'o')

（“”）
（‘o’、‘n’、‘l’）
（‘n’、‘l’、‘i’）
（‘l’、‘i’、‘n’）
（‘i’、‘n’、‘e’）
（‘a’、‘u’、‘t’）
（‘u’、‘t’、‘o’）

等等

问题是我怎样才能避免3克

的分裂，并将其作为一个整体

所需的输出显示输入字符串中的空格已被删除，因此在拆分之前不要忘记用空字符串替换空格：

sentence_1 = "online auto body"

lst = ['<s>'] + list(sentence_1.replace(' ','')) + ['<s>']
tri = [tuple(lst[n:n+3]) for n in range(len(lst)-2)]
print(tri)

输出：

('<s>', 'o', 'n')
('o', 'n', 'l')
('n', 'l', 'i')
('l', 'i', 'n')
('i', 'n', 'e')
('n', 'e', 'a')
('e', 'a', 'u')
('a', 'u', 't')
('u', 't', 'o')
('t', 'o', 'b')
('o', 'b', 'o')
('b', 'o', 'd')
('o', 'd', 'y')
('d', 'y', '<s>')

（''o'，'n'）
（‘o’、‘n’、‘l’）
（‘n’、‘l’、‘i’）
（‘l’、‘i’、‘n’）
（‘i’、‘n’、‘e’）
（‘n’、‘e’、‘a’）
（‘e’、‘a’、‘u’）
（‘a’、‘u’、‘t’）
（‘u’、‘t’、‘o’）
（‘t’、‘o’、‘b’）
（‘o’、‘b’、‘o’）
（‘b’、‘o’、‘d’）
（'o'，'d'，'y'）
（'d'，'y'，''）

使用第三方工具（通过

>pip安装更多工具进行安装）

：

代码

import more_itertools as mit


sentence_1 = "online auto body" 
s = "".join(sentence_1)

list(mit.stagger(s, fillvalue="<s>", longest=True))[:-1]

将更多itertools作为mit导入
句子1=“在线车身”
s=”“.加入（第1句）
列表（mit.stagger（s，fillvalue=“，longest=True））[：-1]

输出

[('<s>', 'o', 'n'),
 ('o', 'n', 'l'),
 ('n', 'l', 'i'),
 ('l', 'i', 'n'),
 ('i', 'n', 'e'),
 ('n', 'e', 'a'),
 ('e', 'a', 'u'),
 ('a', 'u', 't'),
 ('u', 't', 'o'),
 ('t', 'o', 'b'),
 ('o', 'b', 'o'),
 ('b', 'o', 'd'),
 ('o', 'd', 'y'),
 ('d', 'y', '<s>')]

[（''o'，'n'），
（‘o’、‘n’、‘l’），
（‘n’、‘l’、‘i’），
（'l'，'i'，'n'），
（‘i’、‘n’、‘e’），
（‘n’、‘e’、‘a’），
（‘e’、‘a’、‘u’），
（‘a’、‘u’、‘t’），
（'u'，'t'，'o'），
（‘t’、‘o’、‘b’），
（'o'，'b'，'o'），
（‘b’、‘o’、‘d’），
（'o'，'d'，'y'），
（‘d’、‘y’、’）]

此工具生成元组，其中的项与输入iterable偏移。尾随偏移量被

fillvalue

参数替换。

将字符串转换为列表。在第一个和最后一个列表中添加

。应用现有的三元算法。

不是一个字符，而是一个3个字符的字符串。为什么不从传递给

ngrams

的字符串中删除

，然后在列表中手动添加一个元组。使用一个您知道它不会出现在文本中的字符（例如

\xFF

）。在设计一种类似XML的协议时，最好做得正确：为开始和结束分别设置标记，以便在遇到标记时知道字符串是开始还是结束（例如

“\xFE-online-auto-body\xFF”

）。还有，是否忽略了预期的空格？@CristiFati，我遵循的说明告诉我要这样做，但请确保您所说的方式更有意义。是的，它是。

”。join（“foo-bar”）

“foo-bar”

，因此不会删除空格。你必须使用

”。join（“foo bar.split（））

@pylang:不，它不起作用（请参阅我对你答案的评论）。你是对的。您可以将字符串拆分为多个可重用项并重新连接。

import more_itertools as mit


sentence_1 = "online auto body" 
s = "".join(sentence_1)

list(mit.stagger(s, fillvalue="<s>", longest=True))[:-1]

[('<s>', 'o', 'n'),
 ('o', 'n', 'l'),
 ('n', 'l', 'i'),
 ('l', 'i', 'n'),
 ('i', 'n', 'e'),
 ('n', 'e', 'a'),
 ('e', 'a', 'u'),
 ('a', 'u', 't'),
 ('u', 't', 'o'),
 ('t', 'o', 'b'),
 ('o', 'b', 'o'),
 ('b', 'o', 'd'),
 ('o', 'd', 'y'),
 ('d', 'y', '<s>')]