Python 适合快速搜索集合的数据结构。输入:标记,输出:句子
我有以下问题Python 适合快速搜索集合的数据结构。输入:标记,输出:句子,python,database,search,set,Python,Database,Search,Set,我有以下问题 >>> a = defaultdict(lambda: set()) >>> a['woman'] set([]) >>> a['woman'].add(1) >>> str(a) "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})" >>> a['beach'].update
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我得到1-10个标签,与一个图像相关,每个标签都有可能存在于图像中
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
输入:海滩,女人,狗,树
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我想从数据库中检索一个已经组成的句子,这是最相关的标签
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
e、 g:
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
海滩->“在海滩的乐趣”/“在海滩上的寒冷”
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
海滩,女人->“女人在海滩”
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
海滩,女人,狗->找不到
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
取最接近的存在,但考虑概率 比如说:女人0.95,海滩0.85,狗0.7 因此,如果存在以女人+海滩(0.95,0.85)为例,然后是女人+狗和最后一个海滩+狗,顺序是越高越好,但我们不是求和
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我曾想过使用python集,但我不确定如何使用
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
另一个选项是defaultdict:
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
db['beach']['woman']['dog'],但我也想从以下方面得到同样的结果:
db['woman']['beeach']['dog']
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我想得到一个好的解决方案。
谢谢
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
编辑:工作解决方案
from collections import OrderedDict
list_of_keys = []
sentences = OrderedDict()
sentences[('dogs',)] = ['I like dogs','dogs are man best friends!']
sentences[('dogs', 'beach')] = ['the dog is at the beach']
sentences[('woman', 'cafe')] = ['The woman sat at the cafe.']
sentences[('woman', 'beach')] = ['The woman was at the beach']
sentences[('dress',)] = ['hi nice dress', 'what a nice dress !']
def keys_to_list_of_sets(dict_):
list_of_keys = []
for key in dict_:
list_of_keys.append(set(key))
return list_of_keys
def match_best_sentence(image_tags):
for i, tags in enumerate(list_of_keys):
if (tags & image_tags) == tags:
print(list(sentences.keys())[i])
list_of_keys = keys_to_list_of_sets(sentences)
tags = set(['beach', 'dogs', 'woman'])
match_best_sentence(tags)
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
结果:
('dogs',)
('dogs', 'beach')
('woman', 'beach')
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
此解决方案运行在有序字典的所有键上,
o(n),我希望看到任何性能改进。这主要取决于数据库的大小和关键字之间组合的数量。此外,这还取决于您最常执行的操作。
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
如果它很小,并且您需要快速的查找
操作,则可以使用带有冻结集
的字典作为键,其中包含标记和所有相关句子的值
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
比如说,
d=defaultdict(list)
# preprocessing
d[frozenset(["bob","car","red"])].append("Bob owns a red car")
# searching
d[frozenset(["bob","car","red"])] #['Bob owns a red car']
d[frozenset(["red","car","bob"])] #['Bob owns a red car']
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
对于“bob”、“car”等词的组合,根据关键字的数量和更重要的内容,您有不同的可能性。比如说
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
- 对于每个组合,您可以使用其他条目
- 您可以迭代这些键,并检查包含car和bob的键
- 在不使用DBs的情况下,最简单的方法似乎是为每个单词保留集合并进行交叉
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
更明确地说:
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
如果一个句子包含“女人”这个词,那么你就把它放在“女人”集中。同样,对于狗和海滩等,每个句子都是如此。这意味着您的空间复杂度为O(句子*平均标记),因为每个句子在数据结构中重复
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
你可能有:
>>> dogs = set(["I like dogs", "the dog is at the beach"])
>>> woman = set(["The woman sat at the cafe.", "The woman was at the beach"])
>>> beach = set(["the dog is at the beach", "The woman was at the beach", "I do not like the beach"])
>>> dogs.intersection(beach)
{'the dog is at the beach'}
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
您可以将其构建到defaultdict顶部的对象中,这样您就可以获取标记列表,并且只能与这些列表相交并返回结果
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
大致实施思路:
from collections import defaultdict
class myObj(object): #python2
def __init__(self):
self.sets = defaultdict(lambda: set())
def add_sentence(self, sentence, tags):
#how you process tags is up to you, they could also be parsed from
#the input string.
for t in tags:
self.sets[tag].add(sentence)
def get_match(self, tags):
result = self.sets(tags[0]) #this is a hack
for t in tags[1:]:
result = result.intersection(self.sets[t])
return result #this function can stand to be improved but the idea is there
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
也许这会使默认的dict和set在对象中的显示更加清晰
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
>a=defaultdict(lambda:set())
>>>[女人]
集合([])
>>>a[“妇女”]加上(1)
>>>str(a)
defaultdict(,{'woman':set([1])})
>>>更新([1,2,3,4])
>>>一个[女人]十字路口(一个[海滩])
集合([1])
>>>str(a)
defaultdict(,{'woman':set([1]),'beach':set([1,2,3,4]))
据我所知,您不知道选择什么数据结构来存储标记对?数据库有多大?它包含多少个句子和关键词?它是静态的还是动态变化的添加和删除新句子?静态的,没有变化。1000个句子和增长…@SamperMan我真的不知道我不知道什么。我没有一个合适的存储和检索解决方案。如果您想将[“bob”,“car”]作为查找,该怎么办?这取决于,如果空间不是问题(即少量关键字),您可以使用所有组合的其他条目,我发现这个选项是一个很好的解决方案,但我会改变一件事:我不会用数字代替句子,每个数字都是列表中一个句子的代表,即我的列表[1]=“海滩上的狗”,这样我可以节省内存。我们确实错过了概率部分。我忘了提到这个解决方案没有解决这个问题,因为我不理解你的意图。数字可以加快速度,你也可以在每个字符串周围环绕一个对象,这可能好,也可能不好,取决于你的问题。我将根据你的答案发布一个解决方案,并实现概率。我确实遇到过这样的情况:一个句子必须有beach和woman,但它本身对beach\woman无效。