使用列表理解(Python)删除列表列表中的元素
我有以下数据:使用列表理解(Python)删除列表列表中的元素,python,string,list,list-comprehension,Python,String,List,List Comprehension,我有以下数据: [['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irr
[['The',
'Fulton',
'County',
'Grand',
'Jury',
'said',
'Friday',
'an',
'investigation',
'of',
"Atlanta's",
'recent',
'primary',
'election',
'produced',
'``',
'no',
'evidence',
"''",
'that',
'any',
'irregularities',
'took',
'place',
'.'],
['The',
'jury',
'further',
'said',
'in',
'term-end',
'presentments',
'that',
'the',
'City',
'Executive',
'Committee',
',',
'which',
'had',
'over-all',
'charge',
'of',
'the',
'election',
',',
'``',
'deserves',
'the',
'praise',
'and',
'thanks',
'of',
'the',
'City',
'of',
'Atlanta',
"''",
'for',
'the',
'manner',
'in',
'which',
'the',
'election',
'was',
'conducted',
'.']]
所以我有一个由另外两个列表组成的列表在我的例子中,一个大列表中有50000个列表。
我想删除所有标点符号和停止词,如,a等
以下是我编写的代码:
import string
from nltk.corpus import stopwords
nltk.download('stopwords')
punct = list(string.punctuation)
punct.append("``")
punct.append("''")
stops = set(stopwords.words("english"))
res = [[word.lower() for word in sentence if word not in punct or word.lower() in not stops] for sentence in dataset]
但它返回的列表与我最初拥有的列表相同。
我的代码有什么问题吗?您应该使用或取消其标题:
否则,您将获得所有元素,因为它们不存在于停止或点列表的左侧。您应该使用或并取消其标题:
否则,您将获得所有元素,因为它们不存在于一个站点或点列表中的左侧。由于点和点不重叠,每个单词将不在一个或另一个或可能同时在两个站点中;你想测试两个词都不在的词。因为点和停不在圈内,每个词要么不在一个词中,要么不在另一个词中,或者可能同时在两个词中;您想测试两种语言中都没有的单词。假设可以更新停止,这是一种避免两级理解的替代方法
import string
import nltk
from nltk.corpus import stopwords
dataset = [
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an',
'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election',
'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities',
'took', 'place', '.'],
['The', 'jury', 'further', 'said', 'in', 'term-end', 'presentments',
'that', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had',
'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves',
'the', 'praise', 'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta',
"''", 'for', 'the', 'manner',
'in', 'which', 'the', 'election', 'was', 'conducted', '.']
]
nltk.download('stopwords')
punct = list(string.punctuation)
punct.append("``")
punct.append("''")
stops = set(stopwords.words("english"))
# Union of punct and stops
stops.update(punct)
res1 = [[word for word in sentence if word.lower() not in stops]
for sentence in dataset]
# Alternative solution that avoids an explict 2-level list comprehension
def filter_the(sentence, stops):
return [word for word in sentence if word.lower() not in stops]
res2 = [filter_the(sentence, stops) for sentence in dataset]
print(res1 == res2)
假设可以更新站点,这是一种避免两级理解的替代方法
import string
import nltk
from nltk.corpus import stopwords
dataset = [
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an',
'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election',
'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities',
'took', 'place', '.'],
['The', 'jury', 'further', 'said', 'in', 'term-end', 'presentments',
'that', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had',
'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves',
'the', 'praise', 'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta',
"''", 'for', 'the', 'manner',
'in', 'which', 'the', 'election', 'was', 'conducted', '.']
]
nltk.download('stopwords')
punct = list(string.punctuation)
punct.append("``")
punct.append("''")
stops = set(stopwords.words("english"))
# Union of punct and stops
stops.update(punct)
res1 = [[word for word in sentence if word.lower() not in stops]
for sentence in dataset]
# Alternative solution that avoids an explict 2-level list comprehension
def filter_the(sentence, stops):
return [word for word in sentence if word.lower() not in stops]
res2 = [filter_the(sentence, stops) for sentence in dataset]
print(res1 == res2)