在Python中将列表从文本拆分为ngram_Python_Python 2.7_Python 3.x

在Python中将列表从文本拆分为ngram

python python-2.7 python-3.x

在Python中将列表从文本拆分为ngram,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我必须将一个文本文件拆分为列表中每个列表中特定数量的单词，最好在示例中显示假设文本文件如下所示 "i am having a good day today" ngrams.makeNGrams("ngrams.txt", 2) #so since the given variable says 2 the output should look like this: [['i', 'am'],['am', 'having'],['having', 'a'],['a',’good’],[’go

我必须将一个文本文件拆分为列表中每个列表中特定数量的单词，最好在示例中显示

假设文本文件如下所示

"i am having a good day today"

ngrams.makeNGrams("ngrams.txt", 2)
#so since the given variable says 2 the output should look like this:

[['i', 'am'],['am', 'having'],['having', 'a'],['a',’good’],[’good’, ’day’],[’day’,’today’]]

ngrams.makeNGrams("ngrams.txt", 3)

#it should give out:

[[’i’,’am’,’having’],[’having’,’a’,’good’],[’good’,’day’,’today’]]

我必须写一个函数，看起来像这样

"i am having a good day today"

ngrams.makeNGrams("ngrams.txt", 2)
#so since the given variable says 2 the output should look like this:

[['i', 'am'],['am', 'having'],['having', 'a'],['a',’good’],[’good’, ’day’],[’day’,’today’]]

ngrams.makeNGrams("ngrams.txt", 3)

#it should give out:

[[’i’,’am’,’having’],[’having’,’a’,’good’],[’good’,’day’,’today’]]

如果函数看起来像这样

"i am having a good day today"

ngrams.makeNGrams("ngrams.txt", 2)
#so since the given variable says 2 the output should look like this:

[['i', 'am'],['am', 'having'],['having', 'a'],['a',’good’],[’good’, ’day’],[’day’,’today’]]

ngrams.makeNGrams("ngrams.txt", 3)

#it should give out:

[[’i’,’am’,’having’],[’having’,’a’,’good’],[’good’,’day’,’today’]]

现在有人知道我该如何处理这件事了吗？提前感谢您定义：

def ngrams(text, n):
    words = text.split()
    return [ words[i:i+n] for i in range(len(words)-n+1) ]

和使用：

s = "i am having a good day today"
ngrams(s, 2)

我相信有一种更像蟒蛇的方式。它不是一个函数（但应该很容易适应），而是一个程序。我认为它符合您的规格：

import sys

num = int(sys.argv[1])

cad = "i am having a good day today"

listCad =  cad.split(" ")

listOfLists = []
i = 0
while i <= len(listCad) - num:
   listOfLists.append(listCad[i:i+num])
   i = i + (num - 1)

print listOfLists

导入系统 num=int（sys.argv[1]） cad=“我今天过得很好” listCad=cad.split（“”）列表=[] i=0

而我我会这样做：

def ngrams(words, n):
    return zip(*(words[i:] for i in range(n)))

用法：

>>> words = "i am having a good day today".split()
>>> list(ngrams(words, 2))
[('i', 'am'), ('am', 'having'), ('having', 'a'), ('a', 'good'), ('good', 'day'), ('day', 'today')]
>>> list(ngrams(words, 3))
[('i', 'am', 'having'), ('am', 'having', 'a'), ('having', 'a', 'good'), ('a', 'good', 'day'), ('good', 'day', 'today')]

其思想是从原始列表生成

列表，第i个列表移动

。然后简单地

zip

将这些移位的列表放在一起并返回结果

n=3的可视化：
['i',      'am',     'having', 'a',    'good', 'day', 'today']  # not shifted
['am',     'having', 'a',      'good', 'day',  'today']         # shifted by 1
['having', 'a',      'good',   'day',  'today']                 # shifted by 2

zip
函数将相同索引处的元素缝合在一起，直到用尽最短的列表，生成所需的输出。
堆栈溢出不是代码编写服务，这看起来像是家庭作业。向我们展示您的尝试：有关更多信息，请参阅我得到的TypeError:type对象参数在*之后必须是序列，而不是生成器您是否使用Python3？编辑-使用Python2.7和Python3.5进行了尝试，效果良好-我编辑了答案，将结果转换为列表，使其在这两种情况下的行为相同。是的，Python3.4我跳过了使用文件数据，并使用字符串作为对象，就像您一样，但我得到了是的，在Python3中，zip对象是一个生成器，可以根据需要懒洋洋地提供结果，即实际需要时。您可以使用list（您的\u-zip\u对象）
（我也更新了我的答案）将其转换为列表。非常感谢Plamut，祝您度过愉快的一天！