Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从列表中删除与子字符串匹配的项的最快方法-Python_Python_List_Set - Fatal编程技术网

从列表中删除与子字符串匹配的项的最快方法-Python

从列表中删除与子字符串匹配的项的最快方法-Python,python,list,set,Python,List,Set,删除列表中与集合中的子字符串匹配的项的最快方法是什么 比如说, the_list = ['Donald John Trump (born June 14, 1946) is an American businessman, television personality', 'and since June 2015, a candidate for the Republican nomination for President of the United States in the 2016 e

删除列表中与集合中的子字符串匹配的项的最快方法是什么

比如说,

the_list =
['Donald John Trump (born June 14, 1946) is an American businessman, television personality',
 'and since June 2015, a candidate for the Republican nomination for President of the United States in the 2016 election.',
 'He is the chairman and president of The Trump Organization and the founder of Trump Entertainment Resorts.',
 'Trumps career',
 'branding efforts',
 'personal life',
 'and outspoken manner have made him a celebrity.',
 'Trump is a native of New York City and a son of Fred Trump, who inspired him to enter real estate development.',
 'While still attending college he worked for his fathers firm',
 'Elizabeth Trump & Son. Upon graduating in 1968 he joined the company',
 'and in 1971 was given control, renaming the company The Trump Organization.',
 'Since then he has built hotels',
 'casinos',
 'golf courses',
 'and other properties',
 'many of which bear his name. He is a major figure in the American business scene and has received prominent media exposure']
列表实际上比这个长很多(数百万个字符串元素),我想删除集合中包含字符串的所有元素,例如

{"Donald Trump", "Trump Organization","Donald J. Trump", "D.J. Trump", "dump", "dd"} 

最快的方式是什么?循环速度最快吗?

如果您的字符串已经在内存中,请使用列表理解:

new = [line for line in the_list if not any(item in line for item in set_of_words)]
如果您在内存中没有将它们作为内存使用方面更优化的方法,则可以使用生成器表达式:

new = (line for line in the_list if not any(item in line for item in set_of_words))
这是专门为这项任务设计的。它的显著优点是时间复杂度O(n+m)比嵌套循环O(n*m)低得多,其中n是要查找的字符串数,m是要搜索的字符串数

有一个很好的解释。同时还有一些实现,但我还没有研究它们