Python 从包含在另一个列表中的列表中查找元素_Python_Pandas

Python 从包含在另一个列表中的列表中查找元素

python pandas

Python 从包含在另一个列表中的列表中查找元素,python,pandas,Python,Pandas,我有两张单子 list_1=['mom','father','daughter','dog','soccer'] list_2=['beautiful mom','father day','snoop dog','Manchester United','Windows Office', 'Snoopy Dog'] 我想建立一种关系，不管list\u 1中的单词是否在list\u 2中。例如： mom : ['beautiful mom'] father : ['father day'] da

我有两张单子

list_1=['mom','father','daughter','dog','soccer']
list_2=['beautiful mom','father day','snoop dog','Manchester United','Windows Office', 'Snoopy Dog']

我想建立一种关系，不管

list\u 1

中的单词是否在

list\u 2

中。例如：

mom : ['beautiful mom']
father : ['father day']
daughter : []
dog : ['snoop dog', 'Snoopy Dog']
soccer : []

对于

list_1

中的每个元素，我需要查看

list_2

。如果包含该元素，我将其添加到列表中。我做了如下尝试：

set(list_1).issubset(list_2)

查看子集；但正如我所提到的，我的预期输出是这样的（在一个数据帧中）：

您可以在

列表2

中将单个单词映射为短语，建立反向索引。然后只需从

列表1

中查找它们：

from collections import defaultdict

list_1=['mom','father','daughter','dog','soccer']
list_2=['beautiful mom','father day','snoop dog','Manchester United','Windows Office', 'Snoopy Dog']

index = defaultdict(list)

for v in list_2:
    for k in v.split():
        index[k.lower()].append(v)

res = {k:index[k.lower()] for k in list_1}

res

将是：

{'mom': ['beautiful mom'],
 'father': ['father day'],
 'daughter': [],
 'dog': ['snoop dog', 'Snoopy Dog'],
 'soccer': []}

看起来你在做一个术语文档矩阵（或者TFIDF，或者其他什么）。通常使用矩阵比使用列表更好。有，请看一下。谢谢smci。我会看一看这些主题。对于大量的单词和长句（例如推特），这似乎不太适合。做一些类似于列表1中的每个元素和列表2中的每个元素的事情会更好吗？我不确定我是否在关注@TrentonMcKinney。你的建议听起来像是O（n²）（充其量）。上面以O（n）为索引（n为总字数），然后每次查找都是固定时间。如果你不标记它，你还需要找出如何防止像

mom

这样的东西映射到像

moment

这样的词。我只是想确定

list_2

中的短语是否相当大，比如电子邮件或推特。

索引

如何为每个单词和短语创建一个

dict

。我猜这种情况在行动范围内可能没有意义。谢谢Mark@TrentonMcKinney我认为这是对大量投入的合理关注。可能更多的是内存问题，因为字典可能会因为不需要多次遍历列表2而变大。这似乎是空间和时间复杂性之间的折衷。

{'mom': ['beautiful mom'],
 'father': ['father day'],
 'daughter': [],
 'dog': ['snoop dog', 'Snoopy Dog'],
 'soccer': []}