在python中组合列表中的元素_Python_List

在python中组合列表中的元素

python list

在python中组合列表中的元素,python,list,Python,List,我正在处理ngram模型的填充。我的代码是这样的 n = 5 text = "hello how are" tokens = text[-n:] prefix = tokens[:-1] toPad = (n) - len(prefix)-1 prefix = "<s>"*toPad+tokens print(list(prefix)) 请帮我解决这个问题。因为前缀是一个字符串，所以函数list（）会将它标记为一个字符列表，因为是一个字符串，所以它会将它拆分为['']。您可以在

我正在处理ngram模型的填充。我的代码是这样的

n = 5
text = "hello how are"
tokens = text[-n:]
prefix = tokens[:-1]
toPad = (n) - len(prefix)-1
prefix = "<s>"*toPad+tokens
print(list(prefix))

请帮我解决这个问题。

因为

前缀

是一个字符串，所以函数list（）会将它标记为一个字符列表，因为

是一个字符串，所以它会将它拆分为

['']

。您可以在循环中生成列表，如：

n = 5
text = "he"
tokens = text[-n:]
prefix = tokens[:-1]
toPad = (n) - len(prefix)-1
prefix = "<s>"*toPad+tokens
prefList = []
i = 0
while i < len(prefix):
    if prefix[i] == "<":
        prefList.append("<s>")
        i += 3
    else:
        prefList.append(prefix[i])
        i += 1

print(prefList)

n=5
text=“他”
令牌=文本[-n:]
前缀=令牌[：-1]
toPad=（n）-len（前缀）-1
前缀=”“*toPad+代币
prefList=[]
i=0
而i如果前缀[i]==“使用正则表达式中的findall创建列表，而不是列表
代码
import re

def parse(text):
  n = 5
  tokens = text[-n:]
  prefix = tokens[:-1]
  toPad = (n) - len(prefix)-1
  prefix = "<s>"*toPad+tokens

  # Use regex findall to create list
  return re.findall(r'<s>|.', prefix)  # Creates list of either <s> or any character

重新导入
def解析（文本）：
n=5
令牌=文本[-n:]
前缀=令牌[：-1]
toPad=（n）-len（前缀）-1
前缀=”“*toPad+代币
#使用regex findall创建列表
return re.findall（r'|.，前缀）#创建一个或任意字符的列表

测试
print(parse("hello how are"))  # ['w', ' ', 'a', 'r', 'e']
print(parse("he"))             # ['<s>', '<s>', '<s>', 'h', 'e']

print（解析（“你好”）#[w'，'a'，'r'，'e']
打印（解析（“他”））35;[''，''，''，h'，e']

print(parse("hello how are"))  # ['w', ' ', 'a', 'r', 'e']
print(parse("he"))             # ['<s>', '<s>', '<s>', 'h', 'e']