Python 使用单词列表删除字典中的值

Python 使用单词列表删除字典中的值,python,list,dictionary,Python,List,Dictionary,假设我有一个单词列表 nottastyfruits = ['grape', 'orange', 'durian', 'pear'] fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear']} 我想检查字典中的所有关键字,并从nottastyfruits列表中删除这些单词 我现在的代码是 fina

假设我有一个单词列表

 nottastyfruits = ['grape', 'orange', 'durian', 'pear']

 fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
               '002': ['apple', 'watermelon', 'pear']}
我想检查字典中的所有关键字,并从nottastyfruits列表中删除这些单词

我现在的代码是

finalfruits = {}
for key, value in fruitGroup.items():
    fruits = []
    for fruit in value:
        if fruit not in nottastyfruits:
            fruits.append(fruit)
    finalfruits[key] = (fruits)
当您有大数据文本(如大文本预处理)时,这需要很长时间才能运行。有没有一种更高效、更快的方法


感谢您抽出时间

您应该从您的结果列表中设置一个
集合
,以加快查找速度,然后使用字典理解:

nottastyfruits = set(['grape', 'orange', 'durian', 'pear'])

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
           '002': ['apple', 'watermelon', 'pear']}

print {k: [i for i in v if i not in nottastyfruits] for k, v in fruitGroup.iteritems()}

>>> {'002': ['apple', 'watermelon'], '001': ['apple', 'jackfruit', 'Longan']}

您应该从结果列表中设置一个
,以加速查找,然后使用字典理解:

nottastyfruits = set(['grape', 'orange', 'durian', 'pear'])

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
           '002': ['apple', 'watermelon', 'pear']}

print {k: [i for i in v if i not in nottastyfruits] for k, v in fruitGroup.iteritems()}

>>> {'002': ['apple', 'watermelon'], '001': ['apple', 'jackfruit', 'Longan']}
通过使用字典理解将其扁平化将消除
for
循环的开销

nottastyfruits
设置为一个集合将减少查找时间:

nottastyfruits  = set(nottastyfruits)
finalfruits = {k: [f for f in v if f not in nottastyfruits] for k, v in fruitGroup.items()}
通过使用字典理解将其扁平化将消除
for
循环的开销

nottastyfruits
设置为一个集合将减少查找时间:

nottastyfruits  = set(nottastyfruits)
finalfruits = {k: [f for f in v if f not in nottastyfruits] for k, v in fruitGroup.items()}

如果你愿意的话,一个很容易挂起来的水果就是把
不好吃的水果
做成
套装
。此外,您还可以使用理解来挤出一些性能

In [1]: fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
   ...:                '002': ['apple', 'watermelon', 'pear']
   ...:               }

In [2]: nottastyfruit = {'grape', 'orange', 'durian', 'pear'}

In [3]: finalfruits = {k:[f for f in v if f not in nottastyfruit] for k,v in fruitGroup.items()}

In [4]: finalfruits
Out[4]: {'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}

如果你愿意的话,一个很容易挂起来的水果就是把
不好吃的水果
做成
套装
。此外,您还可以使用理解来挤出一些性能

In [1]: fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
   ...:                '002': ['apple', 'watermelon', 'pear']
   ...:               }

In [2]: nottastyfruit = {'grape', 'orange', 'durian', 'pear'}

In [3]: finalfruits = {k:[f for f in v if f not in nottastyfruit] for k,v in fruitGroup.items()}

In [4]: finalfruits
Out[4]: {'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}

由于
nottastyfruits
和字典中的列表都是平面列表,因此可以使用集合来获得两者之间的差异

nottastyfruits = set(['orange', 'pear', 'grape', 'durian'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear'] }

for key, value in fruitGroup.iteritems():
    fruitGroup[key] = list(set(value).difference(nottastyfruits))

print fruitGroup # Prints "{'002': ['watermelon', 'apple'], '001': ['jackfruit', 'apple', 'Longan']}"

由于
nottastyfruits
和字典中的列表都是平面列表,因此可以使用集合来获得两者之间的差异

nottastyfruits = set(['orange', 'pear', 'grape', 'durian'])
fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'], '002': ['apple', 'watermelon', 'pear'] }

for key, value in fruitGroup.iteritems():
    fruitGroup[key] = list(set(value).difference(nottastyfruits))

print fruitGroup # Prints "{'002': ['watermelon', 'apple'], '001': ['jackfruit', 'apple', 'Longan']}"

以下是不同建议解决方案的基准以及基于
filter()
函数的解决方案:

from timeit import timeit


nottastyfruits = ['grape', 'orange', 'durian', 'pear']

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
              '002': ['apple', 'watermelon', 'pear']}


def fruit_filter_original(fruit_groups, not_tasty_fruits):
    final_fruits = {}
    for key, value in fruit_groups.items():
        fruits = []
        for fruit in value:
            if fruit not in not_tasty_fruits:
                fruits.append(fruit)
        final_fruits[key] = (fruits)
    return final_fruits


def fruit_filter_comprehension(fruit_groups, not_tasty_fruits):
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set_comprehension(fruit_groups, not_tasty_fruits):
    not_tasty_fruits = set(not_tasty_fruits)
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set(fruit_groups, not_tasty_fruits):
    return {group: list(set(fruits).difference(not_tasty_fruits))
            for group, fruits in fruit_groups.items()}


def fruit_filter_filter(fruit_groups, not_tasty_fruits):
    return {group: filter(lambda fruit: fruit not in not_tasty_fruits, fruits)
            for group, fruits in fruit_groups.items()}


print(fruit_filter_original(fruitGroup, nottastyfruits))
print(fruit_filter_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set(fruitGroup, nottastyfruits))
print(fruit_filter_filter(fruitGroup, nottastyfruits))


print(timeit("fruit_filter_original(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_original, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_filter(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_filter, fruitGroup, nottastyfruits"))
我们可以看到,并非所有解决方案在性能方面都是相同的:

{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['jackfruit', 'apple', 'Longan'], '002': ['watermelon', 'apple']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
2.57386991159  # fruit_filter_original
2.36822144247  # fruit_filter_comprehension
2.46125930873  # fruit_filter_set_comprehension
4.09036626702  # fruit_filter_set
3.76554637862  # fruit_filter_filter
基于理解的解决方案更好,但与原始代码相比,它不是一个非常显著的改进(至少在给定数据的情况下)。 集合理解解决方案也是一个小小的改进。 基于滤波函数和集差的求解速度较慢

结论: 如果你在寻找性能,Moses Koledoye和juanpa.arrivillaga的解决方案似乎更好。
但是,对于较大的数据,这些结果可能不同,因此使用真实数据进行测试可能是一个好主意。

以下是不同建议解决方案的基准,以及基于
filter()
函数的解决方案:

from timeit import timeit


nottastyfruits = ['grape', 'orange', 'durian', 'pear']

fruitGroup = {'001': ['grape','apple', 'jackfruit', 'orange', 'Longan'],
              '002': ['apple', 'watermelon', 'pear']}


def fruit_filter_original(fruit_groups, not_tasty_fruits):
    final_fruits = {}
    for key, value in fruit_groups.items():
        fruits = []
        for fruit in value:
            if fruit not in not_tasty_fruits:
                fruits.append(fruit)
        final_fruits[key] = (fruits)
    return final_fruits


def fruit_filter_comprehension(fruit_groups, not_tasty_fruits):
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set_comprehension(fruit_groups, not_tasty_fruits):
    not_tasty_fruits = set(not_tasty_fruits)
    return {group: [fruit for fruit in fruits
                         if fruit not in not_tasty_fruits]
            for group, fruits in fruit_groups.items()}


def fruit_filter_set(fruit_groups, not_tasty_fruits):
    return {group: list(set(fruits).difference(not_tasty_fruits))
            for group, fruits in fruit_groups.items()}


def fruit_filter_filter(fruit_groups, not_tasty_fruits):
    return {group: filter(lambda fruit: fruit not in not_tasty_fruits, fruits)
            for group, fruits in fruit_groups.items()}


print(fruit_filter_original(fruitGroup, nottastyfruits))
print(fruit_filter_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set_comprehension(fruitGroup, nottastyfruits))
print(fruit_filter_set(fruitGroup, nottastyfruits))
print(fruit_filter_filter(fruitGroup, nottastyfruits))


print(timeit("fruit_filter_original(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_original, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set_comprehension(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set_comprehension, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_set(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_set, fruitGroup, nottastyfruits"))
print(timeit("fruit_filter_filter(fruitGroup, nottastyfruits)", number=100000,
      setup="from __main__ import fruit_filter_filter, fruitGroup, nottastyfruits"))
我们可以看到,并非所有解决方案在性能方面都是相同的:

{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
{'001': ['jackfruit', 'apple', 'Longan'], '002': ['watermelon', 'apple']}
{'001': ['apple', 'jackfruit', 'Longan'], '002': ['apple', 'watermelon']}
2.57386991159  # fruit_filter_original
2.36822144247  # fruit_filter_comprehension
2.46125930873  # fruit_filter_set_comprehension
4.09036626702  # fruit_filter_set
3.76554637862  # fruit_filter_filter
基于理解的解决方案更好,但与原始代码相比,它不是一个非常显著的改进(至少在给定数据的情况下)。 集合理解解决方案也是一个小小的改进。 基于滤波函数和集差的求解速度较慢

结论: 如果你在寻找性能,Moses Koledoye和juanpa.arrivillaga的解决方案似乎更好。
但是,对于较大的数据,这些结果可能会有所不同,因此最好使用真实数据进行测试。

您的代码缩进错误。请修复您的代码缩进错误。请修复iTunes在更大的计算机上使用iteritems()可能会更节省内存dicts@kezzos是的,这会提高内存效率,但不一定会提高性能。谢谢你noting@kezzos
iteritems()
dore在Python 3中不存在。即使您对Python2代码的理解是正确的,因为我们不知道OP使用的是哪个版本,最好使用
items()
这个版本已经被证明是最快的。谢谢在大型计算机上使用iteritems()可能会提高内存效率dicts@kezzos是的,这会提高内存效率,但不一定会提高性能。谢谢你noting@kezzos
iteritems()
dore在Python 3中不存在。即使您对Python2代码的理解是正确的,因为我们不知道OP使用的是哪个版本,最好使用
items()
这个版本已经被证明是最快的。谢谢