Python 适合快速搜索集合的数据结构。输入:标记,输出:句子

Python 适合快速搜索集合的数据结构。输入:标记,输出:句子,python,database,search,set,Python,Database,Search,Set,我有以下问题 >>> a = defaultdict(lambda: set()) >>> a['woman'] set([]) >>> a['woman'].add(1) >>> str(a) "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})" >>> a['beach'].update

我有以下问题

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我得到1-10个标签,与一个图像相关,每个标签都有可能存在于图像中

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
输入:海滩,女人,狗,树

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我想从数据库中检索一个已经组成的句子,这是最相关的标签

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
e、 g:

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
海滩->“在海滩的乐趣”/“在海滩上的寒冷”

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
海滩,女人->“女人海滩

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
海滩,女人,狗->找不到

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"

取最接近的存在,但考虑概率 比如说:女人0.95,海滩0.85,狗0.7 因此,如果存在以女人+海滩(0.95,0.85)为例,然后是女人+狗和最后一个海滩+狗,顺序是越高越好,但我们不是求和

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我曾想过使用python,但我不确定如何使用

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
另一个选项是defaultdict:

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
db['beach']['woman']['dog'],但我也想从以下方面得到同样的结果: db['woman']['beeach']['dog']

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
我想得到一个好的解决方案。 谢谢

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
编辑:工作解决方案

from collections import OrderedDict
list_of_keys = []
sentences = OrderedDict()
sentences[('dogs',)] = ['I like dogs','dogs are man best friends!']
sentences[('dogs', 'beach')] = ['the dog is at the beach']
sentences[('woman', 'cafe')] = ['The woman sat at the cafe.']
sentences[('woman', 'beach')] = ['The woman was at the beach']
sentences[('dress',)] = ['hi nice dress', 'what a nice dress !']


def keys_to_list_of_sets(dict_):
    list_of_keys = []
    for key in dict_:
        list_of_keys.append(set(key))

    return list_of_keys

def match_best_sentence(image_tags):
    for i, tags in enumerate(list_of_keys):
        if (tags & image_tags) == tags:
            print(list(sentences.keys())[i])

list_of_keys = keys_to_list_of_sets(sentences)
tags = set(['beach', 'dogs', 'woman'])
match_best_sentence(tags)
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
结果:

('dogs',)
('dogs', 'beach')
('woman', 'beach')
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
此解决方案运行在有序字典的所有键上,
o(n),我希望看到任何性能改进。

这主要取决于数据库的大小和关键字之间组合的数量。此外,这还取决于您最常执行的操作。
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
如果它很小,并且您需要快速的
查找
操作,则可以使用带有
冻结集
的字典作为键,其中包含标记和所有相关句子的值

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
比如说,

d=defaultdict(list)
# preprocessing
d[frozenset(["bob","car","red"])].append("Bob owns a red car")

# searching
d[frozenset(["bob","car","red"])]  #['Bob owns a red car']
d[frozenset(["red","car","bob"])]  #['Bob owns a red car']
>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
对于“bob”、“car”等词的组合,根据关键字的数量和更重要的内容,您有不同的可能性。比如说

>>> a = defaultdict(lambda: set())
>>> a['woman']
set([])
>>> a['woman'].add(1)
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
>>> a['beach'].update([1,2,3,4])
>>> a['woman'].intersection(a['beach'])
set([1])
>>> str(a)
"defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
  • 对于每个组合,您可以使用其他条目
  • 您可以迭代这些键,并检查包含car和bob的键

    • 在不使用DBs的情况下,最简单的方法似乎是为每个单词保留集合并进行交叉

      >>> a = defaultdict(lambda: set())
      >>> a['woman']
      set([])
      >>> a['woman'].add(1)
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
      >>> a['beach'].update([1,2,3,4])
      >>> a['woman'].intersection(a['beach'])
      set([1])
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
      
      更明确地说:

      >>> a = defaultdict(lambda: set())
      >>> a['woman']
      set([])
      >>> a['woman'].add(1)
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
      >>> a['beach'].update([1,2,3,4])
      >>> a['woman'].intersection(a['beach'])
      set([1])
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
      
      如果一个句子包含“女人”这个词,那么你就把它放在“女人”集中。同样,对于狗和海滩等,每个句子都是如此。这意味着您的空间复杂度为O(句子*平均标记),因为每个句子在数据结构中重复

      >>> a = defaultdict(lambda: set())
      >>> a['woman']
      set([])
      >>> a['woman'].add(1)
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
      >>> a['beach'].update([1,2,3,4])
      >>> a['woman'].intersection(a['beach'])
      set([1])
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
      
      你可能有:

      >>> dogs = set(["I like dogs", "the dog is at the beach"])
      >>> woman = set(["The woman sat at the cafe.", "The woman was at the beach"])
      >>> beach = set(["the dog is at the beach", "The woman was at the beach", "I do not like the beach"])
      >>> dogs.intersection(beach)
      {'the dog is at the beach'}
      
      >>> a = defaultdict(lambda: set())
      >>> a['woman']
      set([])
      >>> a['woman'].add(1)
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
      >>> a['beach'].update([1,2,3,4])
      >>> a['woman'].intersection(a['beach'])
      set([1])
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
      
      您可以将其构建到defaultdict顶部的对象中,这样您就可以获取标记列表,并且只能与这些列表相交并返回结果

      >>> a = defaultdict(lambda: set())
      >>> a['woman']
      set([])
      >>> a['woman'].add(1)
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
      >>> a['beach'].update([1,2,3,4])
      >>> a['woman'].intersection(a['beach'])
      set([1])
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
      
      大致实施思路:

      from collections import defaultdict
      class myObj(object): #python2
          def __init__(self):
              self.sets = defaultdict(lambda: set()) 
      
          def add_sentence(self, sentence, tags):
               #how you process tags is up to you, they could also be parsed from
               #the input string. 
               for t in tags:
                   self.sets[tag].add(sentence)
      
          def get_match(self, tags):
               result = self.sets(tags[0]) #this is a hack 
               for t in tags[1:]:
                   result = result.intersection(self.sets[t])
      
               return result #this function can stand to be improved but the idea is there
      
      >>> a = defaultdict(lambda: set())
      >>> a['woman']
      set([])
      >>> a['woman'].add(1)
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
      >>> a['beach'].update([1,2,3,4])
      >>> a['woman'].intersection(a['beach'])
      set([1])
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
      
      也许这会使默认的dict和set在对象中的显示更加清晰

      >>> a = defaultdict(lambda: set())
      >>> a['woman']
      set([])
      >>> a['woman'].add(1)
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1])})"
      >>> a['beach'].update([1,2,3,4])
      >>> a['woman'].intersection(a['beach'])
      set([1])
      >>> str(a)
      "defaultdict(<function <lambda> at 0x7fcb3bbf4b90>, {'woman': set([1]), 'beach': set([1, 2, 3, 4])})"
      
      >a=defaultdict(lambda:set())
      >>>[女人]
      集合([])
      >>>a[“妇女”]加上(1)
      >>>str(a)
      defaultdict(,{'woman':set([1])})
      >>>更新([1,2,3,4])
      >>>一个[女人]十字路口(一个[海滩])
      集合([1])
      >>>str(a)
      defaultdict(,{'woman':set([1]),'beach':set([1,2,3,4]))
      
      据我所知,您不知道选择什么数据结构来存储标记对?数据库有多大?它包含多少个句子和关键词?它是静态的还是动态变化的添加和删除新句子?静态的,没有变化。1000个句子和增长…@SamperMan我真的不知道我不知道什么。我没有一个合适的存储和检索解决方案。如果您想将[“bob”,“car”]作为查找,该怎么办?这取决于,如果空间不是问题(即少量关键字),您可以使用所有组合的其他条目,我发现这个选项是一个很好的解决方案,但我会改变一件事:我不会用数字代替句子,每个数字都是列表中一个句子的代表,即我的列表[1]=“海滩上的狗”,这样我可以节省内存。我们确实错过了概率部分。我忘了提到这个解决方案没有解决这个问题,因为我不理解你的意图。数字可以加快速度,你也可以在每个字符串周围环绕一个对象,这可能好,也可能不好,取决于你的问题。我将根据你的答案发布一个解决方案,并实现概率。我确实遇到过这样的情况:一个句子必须有beach和woman,但它本身对beach\woman无效。