Python 使用单词列表删除字典中的值
假设我有一个单词列表Python 使用单词列表删除字典中的值,python,list,dictionary,Python,List,Dictionary,假设我有一个单词列表 nottastyfruits = ['grape', 'orange', 'durian', 'pear'] fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear']} 我想检查字典中的所有关键字,并从nottastyfruits列表中删除这些单词 我现在的代码是 fina
nottastyfruits = ['grape', 'orange', 'durian', 'pear']
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
我想检查字典中的所有关键字,并从nottastyfruits列表中删除这些单词
我现在的代码是
finalfruits = {}
for key, value in fruitGroup.items():
fruits = []
for fruit in value:
if fruit not in nottastyfruits:
fruits.append(fruit)
finalfruits[key] = (fruits)
当您有大数据文本(如大文本预处理)时,这需要很长时间才能运行。有没有一种更高效、更快的方法
感谢您抽出时间您应该从您的结果列表中设置一个
集合
,以加快查找速度,然后使用字典理解:
nottastyfruits = set(['grape', 'orange', 'durian', 'pear'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
print {k: [i for i in v if i not in nottastyfruits] for k, v in fruitGroup.iteritems()}
>>> {'002': ['apple', 'watermelon'], '001': ['apple', 'jackfruit', 'Longan']}
您应该从结果列表中设置一个
,以加速查找,然后使用字典理解:
nottastyfruits = set(['grape', 'orange', 'durian', 'pear'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
print {k: [i for i in v if i not in nottastyfruits] for k, v in fruitGroup.iteritems()}
>>> {'002': ['apple', 'watermelon'], '001': ['apple', 'jackfruit', 'Longan']}
通过使用字典理解将其扁平化将消除for
循环的开销
将nottastyfruits
设置为一个集合将减少查找时间:
nottastyfruits = set(nottastyfruits)
finalfruits = {k: [f for f in v if f not in nottastyfruits] for k, v in fruitGroup.items()}
通过使用字典理解将其扁平化将消除for
循环的开销
将nottastyfruits
设置为一个集合将减少查找时间:
nottastyfruits = set(nottastyfruits)
finalfruits = {k: [f for f in v if f not in nottastyfruits] for k, v in fruitGroup.items()}
如果你愿意的话,一个很容易挂起来的水果就是把不好吃的水果做成套装。此外,您还可以使用理解来挤出一些性能
In [1]: fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
...: '002': ['apple', 'watermelon', 'pear']
...: }
In [2]: nottastyfruit = {'grape', 'orange', 'durian', 'pear'}
In [3]: finalfruits = {k:[f for f in v if f not in nottastyfruit] for k,v in fruitGroup.items()}
In [4]: finalfruits
Out[4]: {'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
如果你愿意的话,一个很容易挂起来的水果就是把不好吃的水果做成套装。此外,您还可以使用理解来挤出一些性能
In [1]: fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
...: '002': ['apple', 'watermelon', 'pear']
...: }
In [2]: nottastyfruit = {'grape', 'orange', 'durian', 'pear'}
In [3]: finalfruits = {k:[f for f in v if f not in nottastyfruit] for k,v in fruitGroup.items()}
In [4]: finalfruits
Out[4]: {'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
由于nottastyfruits
和字典中的列表都是平面列表,因此可以使用集合来获得两者之间的差异
nottastyfruits = set(['orange', 'pear', 'grape', 'durian'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear'] }
for key, value in fruitGroup.iteritems():
fruitGroup[key] = list(set(value).difference(nottastyfruits))
print fruitGroup # Prints "{'002': ['watermelon', 'apple'], '001': ['jackfruit', 'apple', 'Longan']}"
由于nottastyfruits
和字典中的列表都是平面列表,因此可以使用集合来获得两者之间的差异
nottastyfruits = set(['orange', 'pear', 'grape', 'durian'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear'] }
for key, value in fruitGroup.iteritems():
fruitGroup[key] = list(set(value).difference(nottastyfruits))
print fruitGroup # Prints "{'002': ['watermelon', 'apple'], '001': ['jackfruit', 'apple', 'Longan']}"
以下是不同建议解决方案的基准以及基于filter()
函数的解决方案:
from timeit import timeit
nottastyfruits = ['grape', 'orange', 'durian', 'pear']
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
def fruit_filter_original(fruit_groups, not_tasty_fruits):
final_fruits = {}
for key, value in fruit_groups.items():
fruits = []
for fruit in value:
if fruit not in not_tasty_fruits:
fruits.append(fruit)
final_fruits[key] = (fruits)
return final_fruits
def fruit_filter_comprehension(fruit_groups, not_tasty_fruits):
return {group: [fruit for fruit in fruits
if fruit not in not_tasty_fruits]
for group, fruits in fruit_groups.items()}
def fruit_filter_set_comprehension(fruit_groups, not_tasty_fruits):
not_tasty_fruits = set(not_tasty_fruits)
return {group: [fruit for fruit in fruits
if fruit not in not_tasty_fruits]
for group, fruits in fruit_groups.items()}
def fruit_filter_set(fruit_groups, not_tasty_fruits):
return {group: list(set(fruits).difference(not_tasty_fruits))
for group, fruits in fruit_groups.items()}
def fruit_filter_filter(fruit_groups, not_tasty_fruits):
return {group: filter(lambda fruit: fruit not in not_tasty_fruits, fruits)
for group, fruits in fruit_groups.items()}
print(fruit_filter_original(fruitGroup, nottastyfruits))
print(fruit_filter_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set(fruitGroup, nottastyfruits))
print(fruit_filter_filter(fruitGroup, nottastyfruits))
print(timeit("fruit_filter_original(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_original, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_comprehension(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set_comprehension(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_set_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_set, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_filter(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_filter, fruitGroup, nottastyfruits"))
我们可以看到,并非所有解决方案在性能方面都是相同的:
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['jackfruit', 'apple', 'Longan'], '002': ['watermelon', 'apple']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
2.57386991159 # fruit_filter_original
2.36822144247 # fruit_filter_comprehension
2.46125930873 # fruit_filter_set_comprehension
4.09036626702 # fruit_filter_set
3.76554637862 # fruit_filter_filter
基于理解的解决方案更好,但与原始代码相比,它不是一个非常显著的改进(至少在给定数据的情况下)。
集合理解解决方案也是一个小小的改进。
基于滤波函数和集差的求解速度较慢
结论:
如果你在寻找性能,Moses Koledoye和juanpa.arrivillaga的解决方案似乎更好。
但是,对于较大的数据,这些结果可能不同,因此使用真实数据进行测试可能是一个好主意。以下是不同建议解决方案的基准,以及基于filter()
函数的解决方案:
from timeit import timeit
nottastyfruits = ['grape', 'orange', 'durian', 'pear']
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
'002': ['apple', 'watermelon', 'pear']}
def fruit_filter_original(fruit_groups, not_tasty_fruits):
final_fruits = {}
for key, value in fruit_groups.items():
fruits = []
for fruit in value:
if fruit not in not_tasty_fruits:
fruits.append(fruit)
final_fruits[key] = (fruits)
return final_fruits
def fruit_filter_comprehension(fruit_groups, not_tasty_fruits):
return {group: [fruit for fruit in fruits
if fruit not in not_tasty_fruits]
for group, fruits in fruit_groups.items()}
def fruit_filter_set_comprehension(fruit_groups, not_tasty_fruits):
not_tasty_fruits = set(not_tasty_fruits)
return {group: [fruit for fruit in fruits
if fruit not in not_tasty_fruits]
for group, fruits in fruit_groups.items()}
def fruit_filter_set(fruit_groups, not_tasty_fruits):
return {group: list(set(fruits).difference(not_tasty_fruits))
for group, fruits in fruit_groups.items()}
def fruit_filter_filter(fruit_groups, not_tasty_fruits):
return {group: filter(lambda fruit: fruit not in not_tasty_fruits, fruits)
for group, fruits in fruit_groups.items()}
print(fruit_filter_original(fruitGroup, nottastyfruits))
print(fruit_filter_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set(fruitGroup, nottastyfruits))
print(fruit_filter_filter(fruitGroup, nottastyfruits))
print(timeit("fruit_filter_original(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_original, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_comprehension(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set_comprehension(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_set_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_set, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_filter(fruitGroup, nottastyfruits)", number=100000,
setup="from __main__ import fruit_filter_filter, fruitGroup, nottastyfruits"))
我们可以看到,并非所有解决方案在性能方面都是相同的:
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['jackfruit', 'apple', 'Longan'], '002': ['watermelon', 'apple']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
2.57386991159 # fruit_filter_original
2.36822144247 # fruit_filter_comprehension
2.46125930873 # fruit_filter_set_comprehension
4.09036626702 # fruit_filter_set
3.76554637862 # fruit_filter_filter
基于理解的解决方案更好,但与原始代码相比,它不是一个非常显著的改进(至少在给定数据的情况下)。
集合理解解决方案也是一个小小的改进。
基于滤波函数和集差的求解速度较慢
结论:
如果你在寻找性能,Moses Koledoye和juanpa.arrivillaga的解决方案似乎更好。
但是,对于较大的数据,这些结果可能会有所不同,因此最好使用真实数据进行测试。您的代码缩进错误。请修复您的代码缩进错误。请修复iTunes在更大的计算机上使用iteritems()可能会更节省内存dicts@kezzos是的,这会提高内存效率,但不一定会提高性能。谢谢你noting@kezzositeritems()
dore在Python 3中不存在。即使您对Python2代码的理解是正确的,因为我们不知道OP使用的是哪个版本,最好使用items()
这个版本已经被证明是最快的。谢谢在大型计算机上使用iteritems()可能会提高内存效率dicts@kezzos是的,这会提高内存效率,但不一定会提高性能。谢谢你noting@kezzositeritems()
dore在Python 3中不存在。即使您对Python2代码的理解是正确的,因为我们不知道OP使用的是哪个版本,最好使用items()
这个版本已经被证明是最快的。谢谢