Python 嵌套列表迭代_Python_List_Nlp

Python 嵌套列表迭代

python list nlp

Python 嵌套列表迭代,python,list,nlp,Python,List,Nlp,在尝试使用小型word2vec之前，我正在尝试对嵌套列表进行一些预处理，并遇到如下问题： corpus = ['he is a brave king', 'she is a kind queen', 'he is a young boy', 'she is a gentle girl'] corpus = [_.split(' ') for _ in corpus] 他是一个勇敢的国王，她是一个善良的女王，他是一个年轻的男孩她是一个温柔的女孩因此，上面的输出是作为嵌套列表给出的&我打算删除

在尝试使用小型word2vec之前，我正在尝试对嵌套列表进行一些预处理，并遇到如下问题：

corpus = ['he is a brave king', 'she is a kind queen', 'he is a young boy', 'she is a gentle girl']

corpus = [_.split(' ') for _ in corpus]

他是一个勇敢的国王，她是一个善良的女王，他是一个年轻的男孩她是一个温柔的女孩

因此，上面的输出是作为嵌套列表给出的&我打算删除停止词，例如“is”、“a”

for _ in range(0, len(corpus)):
     for x in corpus[_]:
         if x == 'is' or x == 'a':
             corpus[_].remove(x)

他、a、勇敢、国王、她、a、善良、王后、他、a、年轻、男孩、她、a、温柔、女孩

输出似乎表明，在删除每个子列表中的“is”后，循环跳过到下一个子列表，而不是完全迭代

这背后的原因是什么？指数如果是这样的话，假设我想保留嵌套结构，如何解决问题。

除了一个小的更改外，所有代码都是正确的：使用

[：]

使用列表的副本迭代内容，避免通过引用原始列表进行更改。具体地说，您可以创建一个列表的副本，作为

lst\u copy=lst[：]

。这是多种复制方法中的一种（有关综合方法，请参阅）。当您遍历原始列表并通过删除项来修改列表时，计数器会产生您观察到的问题

for _ in range(0, len(corpus)):
     for x in corpus[_][:]: # <--- create a copy of the list using [:]
         if x == 'is' or x == 'a':
             corpus[_].remove(x)

嵌套=[input（）]

nested=[i.split（）表示嵌套中的i]

也许您可以定义一个自定义方法来拒绝与特定条件匹配的元素。类似于itertools（例如：）

方法到位后，可以使用以下方法：

stopwords = ['is', 'and', 'a']
[ list(reject_if(lambda x: x in stopwords, ary)) for ary in corpus ]
#=> [['he', 'brave', 'king'], ['she', 'kind', 'queen'], ['he', 'young', 'boy'], ['she', 'gentle', 'girl']]

虽然这个代码片段可以解决这个问题，但它确实有助于提高文章的质量。请记住，您将在将来回答读者的问题，这些人可能不知道您的代码建议的原因。还请尽量不要用解释性注释挤满你的代码，这会降低代码和解释的可读性！

def reject_if(predicate, iterable):
  for element in iterable:
    if not predicate(element):
      yield element

stopwords = ['is', 'and', 'a']
[ list(reject_if(lambda x: x in stopwords, ary)) for ary in corpus ]
#=> [['he', 'brave', 'king'], ['she', 'kind', 'queen'], ['he', 'young', 'boy'], ['she', 'gentle', 'girl']]