Python 比较列表（A）中的项目是否作为列表（B）中的子项目存在_Python_List_String Comparison

Python 比较列表（A）中的项目是否作为列表（B）中的子项目存在

python list

Python 比较列表（A）中的项目是否作为列表（B）中的子项目存在,python,list,string-comparison,Python,List,String Comparison,我有两个列表，每个列表都是字符串的集合，我想检查list（a）的一项是否存在于list（B）的另一项中。因此，在列表（A）中有一些标准单词和短语应该在列表（B）中找到。我在列表（A）中填写了这个（例如“创新”、“创新”、“新的发展方向”）和柠檬化it（['innovation']、['innovative']、['new'、'way'、'go'] 在列表（B）中，有标记化的和柠檬化的文本句子（'time'，new'，'way'，'go'] 在这个模式中，我试图分析文本中是否出现给定的单词和短

我有两个列表，每个列表都是字符串的集合，我想检查

list（a）

的一项是否存在于

list（B）

的另一项中。因此，在

列表（A）

中有一些标准单词和短语应该在

列表（B）

中找到。我在

列表（A）

中填写了这个

（例如“创新”、“创新”、“新的发展方向”）

和

柠檬化it（['innovation']、['innovative']、['new'、'way'、'go']

在列表（B）
中，有标记化的和柠檬化的文本句子（'time'，new'，'way'，'go']

在这个模式中，我试图分析文本中是否出现给定的单词和短语以及出现的频率
为了匹配模式，我读到需要将每个列表元素本身转换为字符串，以检查它是否是list（b）
中字符串的子字符串
实际产出为：
word of list a:  innovation  appears in list b:  ['time', 'innovation']
word of list a:  innovative  appears in list b:  ['look', 'innovative', 'creative', 'people']

我的目标输出是：
word of list a:  innovation  appears in list b:  ['time', 'innovation']
word of list a:  innovative  appears in list b:  ['look', 'innovative', 'creative', 'people']
word of list a: new way go appears in list b: ['time', 'go', 'new', 'way']


将list（b）
中的项目转换为我尝试使用的list（a）
字符串对我没有帮助
感谢您的帮助！
第一个错误是：不要从单词列表中创建字符串。请使用set
单词和set方法（此处：issubset
）

将单词列表转换为单词集列表
循环第一个列表（a）的集合，并检查该集合是否包含在list_b
的集合中（不要使用any
，否则我们无法知道哪个集合包含当前集合，简单循环即可）

像这样：
list_a = [['innovation'], ['innovative'], ['new', 'way', 'go'], ['set', 'trend']]
list_b = [['time', 'innovation'], ['time', 'go', 'new', 'way'],  ['look', 'innovative', 'creative', 'people']]

list_a = [set(x) for x in list_a]
list_b = [set(x) for x in list_b]

for subset in list_a:
    for other_subset in list_b:
        if subset.issubset(other_subset):
            print("{} appears in list b: {}".format(subset,other_subset))

印刷品：
{'innovation'} appears in list b: {'time', 'innovation'}
{'innovative'} appears in list b: {'look', 'creative', 'innovative', 'people'}
{'new', 'go', 'way'} appears in list b: {'time', 'new', 'go', 'way'}

现在，如果您想保持顺序，但仍然想从元素测试的集合
的优点中获益，只需为list\u b
创建元组列表，因为它重复了几次。无需对list\u a
执行相同的操作，因为它只重复了一次：
# list_a is now unchanged
list_b = [(set(x),x) for x in list_b]

for sublist in list_a:
    subset = set(sublist)
    for other_subset,other_sublist in list_b:
        if subset.issubset(other_subset):
            print("{} appears in list b: {}".format(sublist,other_sublist))

结果:
['innovation'] appears in list b: ['time', 'innovation']
['innovative'] appears in list b: ['look', 'innovative', 'creative', 'people']
['new', 'way', 'go'] appears in list b: ['time', 'go', 'new', 'way']

算法仍然是昂贵的：O（n**3）
但不是O（n**4）
多亏了O（n）
设置查找（与列表查找相比）来测试一个单词列表是否包含在另一个列表中。
假设您只想在a中的一个列表中的所有单词都包含在B中的一个列表中时进行匹配，则可以使用
list_a = [['innovation'], ['innovative'], ['new', 'way', 'go'], ['set', 'trend']]
list_b = [['time', 'innovation'], ['time', 'go', 'new', 'way'], ['look', 'innovative', 'creative', 'people'], ['way', 'go', 'time']]

for a_element in list_a:
    for b_element in list_b:
        for a_element_item in a_element:
            if a_element_item not in b_element:
                break
        else:
            print(a_element, "is in ", b_element)

输出
['innovation'] is in  ['time', 'innovation']
['innovative'] is in  ['look', 'innovative', 'creative', 'people']
['new', 'way', 'go'] is in  ['time', 'go', 'new', 'way']

您是否只在A中的任何单词与B中的任何单词匹配时才对打印感兴趣，或者仅当A中的所有单词都包含在B中时才对打印感兴趣。因此，例如，如果您在B中有一个元素作为['go'，'time']
是否应将其打印为['new'，'way'，'go']的匹配项
或B仅当包含来自AIt的所有项目时才应匹配A中标准的所有单词出现在列表B中时才应匹配。此外，它不应失去短语的语义（单词不应散布在句子中而没有实际意义）非常感谢！我对Python还是一个新手，不明白使用集合的必要性。集合与句子的结构不同，所以仍然可以引用list（a）
？例如：{'new'，'go'，'way'，出现在列表b:{'time'，'go'，'new'，'way}）是的，但这需要一个字典，以frozenset作为键，原始句子作为值。可行。让我在几分钟内解决这个问题。设置单词之间的距离不太大的条件是否可以进一步实现？只是为了防止像“我是新来的，不知道该怎么走”这样的句子与标准匹配“走新路”？这看起来像是一个完全不同的问题（而且更复杂）。我同时编辑了这篇文章，以展示一个有效的解决方案，该解决方案保留了子列表中的词序。
['innovation'] is in  ['time', 'innovation']
['innovative'] is in  ['look', 'innovative', 'creative', 'people']
['new', 'way', 'go'] is in  ['time', 'go', 'new', 'way']