Nlp 基于知识的问答系统没有给出最恰当的答案

Nlp 基于知识的问答系统没有给出最恰当的答案,nlp,question-answering,Nlp,Question Answering,我正在做一个项目,基本上是一个基于知识的问答系统。我的系统接受用户的查询,从Wikipedia下载相关文档,去除所有html标记并提取纯文本。在此之后,它将文档标记为句子,然后形成术语文档(TD)矩阵(查询也作为句子传递)。然后将该TD矩阵转发给pLSA(概率潜在符号分析)算法。然后,利用查询向量计算文档(句子)向量之间的余弦相似度。根据与查询向量的相似性,最相关的句子显示为答案。(在形成TD矩阵时也会进行堵塞)。 问题是is不会显示结果,但不是最相关的结果。我哪里做错了?我所遵循的策略是否正确

我正在做一个项目,基本上是一个基于知识的问答系统。我的系统接受用户的查询,从Wikipedia下载相关文档,去除所有html标记并提取纯文本。在此之后,它将文档标记为句子,然后形成术语文档(TD)矩阵(查询也作为句子传递)。然后将该TD矩阵转发给pLSA(概率潜在符号分析)算法。然后,利用查询向量计算文档(句子)向量之间的余弦相似度。根据与查询向量的相似性,最相关的句子显示为答案。(在形成TD矩阵时也会进行堵塞)。 问题是is不会显示结果,但不是最相关的结果。我哪里做错了?我所遵循的策略是否正确,或者是否存在任何其他可能有帮助的算法?? 以下是我的系统返回的一些问题及其答案:

What is photosynthesis?
ANSWER  1 :   The stroma contains stacks (grana) of thylakoids, which are the site of photosynthesis 

ANSWER  2 :   Factors leaf is the primary site of photosynthesis in plants 

ANSWER  3 :   Samuel Ruben and Martin Kamen used radioactive isotopes to determine that the oxygen liberated in photosynthesis came from the water 

ANSWER  4 :   In plants, algae and cyanobacteria, photosynthesis releases oxygen 
另一个问题

What is Artificial Intelligence?
ANSWER  1 :   the problem of creating 'artificial intelligence' will substantially be solved" 

ANSWER  2 :   37 The leading-edge definition of artificial intelligence research is changing over time 

ANSWER  3 :   Stories of these creatures and their fates discuss many of the same hopes, fears and ethical concerns that are presented by artificial intelligence 

ANSWER  4 :   History of artificial intelligence and Timeline of artificial intelligence Thinking machines and artificial beings appear in Greek myths , such as Talos of Crete , the bronze robot of Hephaestus , and Pygmalion's Galatea 13 Human likenesses believed to have intelligence were built in every major civilization 
Who is a hacker?

ANSWER  1 :   19 Hackers (short stories) Helba from the  

ANSWER  2 :   16 Rafael Núñez aka RaFa was a notorious most wanted hacker by the FBI since 2001 

ANSWER  3 :   Often, this type of 'white hat' hacker is called an ethical hacker 
ANSWER  4 :   Hackers also commonly use port scanners  
另一个问题

What is Artificial Intelligence?
ANSWER  1 :   the problem of creating 'artificial intelligence' will substantially be solved" 

ANSWER  2 :   37 The leading-edge definition of artificial intelligence research is changing over time 

ANSWER  3 :   Stories of these creatures and their fates discuss many of the same hopes, fears and ethical concerns that are presented by artificial intelligence 

ANSWER  4 :   History of artificial intelligence and Timeline of artificial intelligence Thinking machines and artificial beings appear in Greek myths , such as Talos of Crete , the bronze robot of Hephaestus , and Pygmalion's Galatea 13 Human likenesses believed to have intelligence were built in every major civilization 
Who is a hacker?

ANSWER  1 :   19 Hackers (short stories) Helba from the  

ANSWER  2 :   16 Rafael Núñez aka RaFa was a notorious most wanted hacker by the FBI since 2001 

ANSWER  3 :   Often, this type of 'white hat' hacker is called an ethical hacker 
ANSWER  4 :   Hackers also commonly use port scanners  
又一次奔跑

What is biology?
ANSWER  1 :   Molecular biology is the study of biology at a molecular level 

ANSWER  2 :   molecular biology studies the complex interactions of systems of biological molecules 

ANSWER  3 :   The similarities and differences between cell types are particularly relevant to molecular biology 

ANSWER  4 :   Contents History Foundations of modern biology 2 

我认为如果你保持一个完整的统计方法,你的系统将很难改进。从统计NLP的角度来看,你确实做了正确的事情。现在,您可以微调一些参数。要做到这一点,你必须建立一个训练语料库,告诉系统哪个答案是正确的。。。然后看看参数必须取哪个值才能给出这个答案

也就是说,我不认为微调参数将使精度提高20%~30%以上


如果您想更进一步,您将需要一种更具语义的方法,并以符号方式表示知识。例如,这是一个经过充分研究的问题,称为问答(QA)。我在中提供了有关QA的摘要。特别是,你所有的例子都属于“定义问题”的范畴。我建议仔细阅读一些关于“TREC定义问题”的论文,或者寻找一些想法