Python 搜索引擎-通过加权机制对输出进行排序
我正在尝试使用Elastic 7.7.0和Universal Sequence Encoder(USE4)单词嵌入构建一个语义搜索常见问题解答系统,到目前为止,我已经为一组问题和答案编制了索引,我可以进行搜索。每当有输入时,我会进行2次搜索:Python 搜索引擎-通过加权机制对输出进行排序,python,tensorflow,
elasticsearch,word-embedding,sentence-similarity,Python,Tensorflow,
elasticsearch,Word Embedding,Sentence Similarity,我正在尝试使用Elastic 7.7.0和Universal Sequence Encoder(USE4)单词嵌入构建一个语义搜索常见问题解答系统,到目前为止,我已经为一组问题和答案编制了索引,我可以进行搜索。每当有输入时,我会进行2次搜索: 基于索引数据的弹性搜索 使用USE4嵌入进行语义搜索 现在我想将两者结合起来,以提供健壮的输出,因为有时结果与这些单独的算法不同。关于如何将它们结合起来,有什么好的建议吗?使用加权机制为语义搜索赋予更多权重,和/或能够再次匹配它们。问题是我怎样才能两者兼得
import time
import sys
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import csv
import tensorflow as tf
import tensorflow_hub as hub
def connect2ES():
# connect to ES on localhost on port 9200
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
if es.ping():
print('Connected to ES!')
else:
print('Could not connect!')
sys.exit()
print("*********************************************************************************");
return es
def keywordSearch(es, q):
#Search by Keywords
b={
'query':{
'match':{
"title":q
}
}
}
res= es.search(index='questions-index_quora2',body=b)
print("Keyword Search:\n")
for hit in res['hits']['hits']:
print(str(hit['_score']) + "\t" + hit['_source']['title'] )
print("*********************************************************************************");
return
# Search by Vec Similarity
def sentenceSimilaritybyNN(embed, es, sent):
query_vector = tf.make_ndarray(tf.make_tensor_proto(embed([sent]))).tolist()[0]
b = {"query" : {
"script_score" : {
"query" : {
"match_all": {}
},
"script" : {
"source": "cosineSimilarity(params.query_vector, 'title_vector') + 1.0",
"params": {"query_vector": query_vector}
}
}
}
}
#print(json.dumps(b,indent=4))
res= es.search(index='questions-index_quora2',body=b)
print("Semantic Similarity Search:\n")
for hit in res['hits']['hits']:
print(str(hit['_score']) + "\t" + hit['_source']['title'] )
print("*********************************************************************************");
if __name__=="__main__":
es = connect2ES();
embed = hub.load("./data/USE4/") #this is where my USE4 Model is saved.
while(1):
query=input("Enter a Query:");
start = time.time()
if query=="END":
break;
print("Query: " +query)
keywordSearch(es, query)
sentenceSimilaritybyNN(embed, es, query)
end = time.time()
print(end - start)
我的输出如下所示:
Enter a Query:what can i watch this weekend
Query: what can i watch this weekend
Keyword Search:
9.6698 Where can I watch gonulcelen with english subtitles?
7.114256 What are some good movies to watch?
6.3105774 What kind of animal did this?
6.2754908 What are some must watch TV shows before you die?
6.0294256 What is the painting on this image?
6.0294256 What the meaning of this all life?
6.0294256 What are your comments on this picture?
5.9638205 Which is better GTA5 or Watch Dogs?
5.9269657 Can somebody explain to me how to do this problem with steps?
*********************************************************************************
Semantic Similarity Search:
1.6078881 What are some good movies to watch?
1.5065247 What are some must watch TV shows before you die?
1.502714 What are some movies that everyone needs to watch at least once in life?
1.4787409 Where can I watch gonulcelen with english subtitles?
1.4713362 What are the best things to do on Halloween?
1.4669418 Which are the best movies of 2016?
1.4554278 What are some interesting things to do when bored?
1.4307204 How can I improve my skills?
1.4261798 What are the best films that take place in one room?
1.4175651 What are the best things to learn in life?
*********************************************************************************
0.05920886993408203
我想要一个基于这两者的输出,在那里我们可以得到更准确的结果,并相应地对它们进行排序。请建议或重定向,我可以参考一些关于这方面的好做法。提前谢谢。这似乎太宽泛/模糊了。请看,。这似乎太宽泛/模糊了。请看。