使用列表理解（Python）删除列表列表中的元素_Python_String_List_List Comprehension

使用列表理解（Python）删除列表列表中的元素

python string list

使用列表理解（Python）删除列表列表中的元素,python,string,list,list-comprehension,Python,String,List,List Comprehension,我有以下数据： [['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irr

我有以下数据：

[['The',
  'Fulton',
  'County',
  'Grand',
  'Jury',
  'said',
  'Friday',
  'an',
  'investigation',
  'of',
  "Atlanta's",
  'recent',
  'primary',
  'election',
  'produced',
  '``',
  'no',
  'evidence',
  "''",
  'that',
  'any',
  'irregularities',
  'took',
  'place',
  '.'],
 ['The',
  'jury',
  'further',
  'said',
  'in',
  'term-end',
  'presentments',
  'that',
  'the',
  'City',
  'Executive',
  'Committee',
  ',',
  'which',
  'had',
  'over-all',
  'charge',
  'of',
  'the',
  'election',
  ',',
  '``',
  'deserves',
  'the',
  'praise',
  'and',
  'thanks',
  'of',
  'the',
  'City',
  'of',
  'Atlanta',
  "''",
  'for',
  'the',
  'manner',
  'in',
  'which',
  'the',
  'election',
  'was',
  'conducted',
  '.']]

所以我有一个由另外两个列表组成的列表在我的例子中，一个大列表中有50000个列表。我想删除所有标点符号和停止词，如，a等

以下是我编写的代码：

import string
from nltk.corpus import stopwords
nltk.download('stopwords')

punct = list(string.punctuation)
punct.append("``")
punct.append("''")
stops = set(stopwords.words("english")) 

res = [[word.lower() for word in sentence if word not in punct or word.lower() in not stops] for sentence in dataset]

但它返回的列表与我最初拥有的列表相同。我的代码有什么问题吗？

您应该使用或取消其标题：

否则，您将获得所有元素，因为它们不存在于停止或点列表的左侧。

您应该使用或并取消其标题：

否则，您将获得所有元素，因为它们不存在于一个站点或点列表中的左侧。

由于点和点不重叠，每个单词将不在一个或另一个或可能同时在两个站点中；你想测试两个词都不在的词。

因为点和停不在圈内，每个词要么不在一个词中，要么不在另一个词中，或者可能同时在两个词中；您想测试两种语言中都没有的单词。

假设可以更新停止，这是一种避免两级理解的替代方法

import string
import nltk
from nltk.corpus import stopwords


dataset = [
  ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an',
   'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election',
   'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities',
   'took', 'place', '.'],
  ['The', 'jury', 'further', 'said', 'in', 'term-end', 'presentments',
   'that', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had',
   'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves',
   'the', 'praise', 'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta',
   "''", 'for', 'the', 'manner',
   'in', 'which', 'the', 'election', 'was', 'conducted', '.']
  ]

nltk.download('stopwords')

punct = list(string.punctuation)
punct.append("``")
punct.append("''")

stops = set(stopwords.words("english"))

# Union of punct and stops
stops.update(punct)
res1 = [[word for word in sentence if word.lower() not in stops]
        for sentence in dataset]

# Alternative solution that avoids an explict 2-level list comprehension
def filter_the(sentence, stops):
    return [word for word in sentence if word.lower() not in stops]


res2 = [filter_the(sentence, stops) for sentence in dataset]


print(res1 == res2)

假设可以更新站点，这是一种避免两级理解的替代方法

import string
import nltk
from nltk.corpus import stopwords


dataset = [
  ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an',
   'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election',
   'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities',
   'took', 'place', '.'],
  ['The', 'jury', 'further', 'said', 'in', 'term-end', 'presentments',
   'that', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had',
   'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves',
   'the', 'praise', 'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta',
   "''", 'for', 'the', 'manner',
   'in', 'which', 'the', 'election', 'was', 'conducted', '.']
  ]

nltk.download('stopwords')

punct = list(string.punctuation)
punct.append("``")
punct.append("''")

stops = set(stopwords.words("english"))

# Union of punct and stops
stops.update(punct)
res1 = [[word for word in sentence if word.lower() not in stops]
        for sentence in dataset]

# Alternative solution that avoids an explict 2-level list comprehension
def filter_the(sentence, stops):
    return [word for word in sentence if word.lower() not in stops]


res2 = [filter_the(sentence, stops) for sentence in dataset]


print(res1 == res2)