Sparql 如何改进dbpediaspotlight的结果?

Sparql 如何改进dbpediaspotlight的结果?,sparql,wikipedia,dbpedia,linked-data,spotlight-dbpedia,Sparql,Wikipedia,Dbpedia,Linked Data,Spotlight Dbpedia,我使用dbpediaspotlight提取DBpedia资源,如下所示 import json from SPARQLWrapper import SPARQLWrapper, JSON import requests import urllib.parse ## initial consts BASE_URL = 'http://api.dbpedia-spotlight.org/en/annotate?text={text}&confidence={confidence}&

我使用dbpediaspotlight提取DBpedia资源,如下所示

import json
from SPARQLWrapper import SPARQLWrapper, JSON
import requests
import urllib.parse

## initial consts
BASE_URL = 'http://api.dbpedia-spotlight.org/en/annotate?text={text}&confidence={confidence}&support={support}'
TEXT = "Tolerance, safety and efficacy of Hedera helix extract in inflammatory bronchial diseases under clinical practice conditions: a prospective, open, multicentre postmarketing study in 9657 patients.     In this postmarketing study 9657 patients (5181 children) with bronchitis (acute or chronic bronchial inflammatory disease) were treated with a syrup containing dried ivy leaf extract. After 7 days of therapy, 95% of the patients showed improvement or healing of their symptoms. The safety of the therapy was very good with an overall incidence of adverse events of 2.1% (mainly gastrointestinal disorders with 1.5%). In those patients who got concomitant medication as well, it could be shown that the additional application of antibiotics had no benefit respective to efficacy but did increase the relative risk for the occurrence of side effects by 26%. In conclusion, it is to say that the dried ivy leaf extract is effective and well tolerated in patients with bronchitis. In view of the large population considered, future analyses should approach specific issues concerning therapy by age group, concomitant therapy and baseline conditions."
CONFIDENCE = '0.5'
SUPPORT = '10'
REQUEST = BASE_URL.format(
    text=urllib.parse.quote_plus(TEXT), 
    confidence=CONFIDENCE, 
    support=SUPPORT
)
HEADERS = {'Accept': 'application/json'}
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
all_urls = []

r = requests.get(url=REQUEST, headers=HEADERS)
response = r.json()
resources = response['Resources']
for res in resources:
    all_urls.append(res['@URI'])
print(all_urls)
['http://dbpedia.org/resource/Hedera', 
'http://dbpedia.org/resource/Helix', 
'http://dbpedia.org/resource/Bronchitis', 
'http://dbpedia.org/resource/Cough_medicine',
'http://dbpedia.org/resource/Hedera', 
'http://dbpedia.org/resource/After_7',
'http://dbpedia.org/resource/Gastrointestinal_tract',
'http://dbpedia.org/resource/Antibiotics',
'http://dbpedia.org/resource/Relative_risk',
'http://dbpedia.org/resource/Hedera',
'http://dbpedia.org/resource/Bronchitis']
我的文字如下:

在临床实践条件下,蛇舌草提取物对炎症性支气管疾病的耐受性、安全性和有效性:一项针对9657名患者的前瞻性、开放性、多中心上市后研究。在这项上市后研究中,9657名患有支气管炎(急性或慢性支气管炎症性疾病)的患者(5181名儿童)接受了含有干常春藤叶提取物的糖浆治疗。经过7天的治疗,95%的患者的症状有所改善或愈合。治疗的安全性非常好,不良事件的总发生率为2.1%(主要是胃肠道疾病,占1.5%)。在那些同时服用药物的患者中,可以证明额外应用抗生素对疗效没有益处,但确实增加了26%的副作用发生的相对风险。总之,可以说常春藤干叶提取物对支气管炎患者有效且耐受性良好。鉴于所考虑的人口众多,未来的分析应按年龄组、伴随治疗和基线条件探讨有关治疗的具体问题

我得到的结果如下

import json
from SPARQLWrapper import SPARQLWrapper, JSON
import requests
import urllib.parse

## initial consts
BASE_URL = 'http://api.dbpedia-spotlight.org/en/annotate?text={text}&confidence={confidence}&support={support}'
TEXT = "Tolerance, safety and efficacy of Hedera helix extract in inflammatory bronchial diseases under clinical practice conditions: a prospective, open, multicentre postmarketing study in 9657 patients.     In this postmarketing study 9657 patients (5181 children) with bronchitis (acute or chronic bronchial inflammatory disease) were treated with a syrup containing dried ivy leaf extract. After 7 days of therapy, 95% of the patients showed improvement or healing of their symptoms. The safety of the therapy was very good with an overall incidence of adverse events of 2.1% (mainly gastrointestinal disorders with 1.5%). In those patients who got concomitant medication as well, it could be shown that the additional application of antibiotics had no benefit respective to efficacy but did increase the relative risk for the occurrence of side effects by 26%. In conclusion, it is to say that the dried ivy leaf extract is effective and well tolerated in patients with bronchitis. In view of the large population considered, future analyses should approach specific issues concerning therapy by age group, concomitant therapy and baseline conditions."
CONFIDENCE = '0.5'
SUPPORT = '10'
REQUEST = BASE_URL.format(
    text=urllib.parse.quote_plus(TEXT), 
    confidence=CONFIDENCE, 
    support=SUPPORT
)
HEADERS = {'Accept': 'application/json'}
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
all_urls = []

r = requests.get(url=REQUEST, headers=HEADERS)
response = r.json()
resources = response['Resources']
for res in resources:
    all_urls.append(res['@URI'])
print(all_urls)
['http://dbpedia.org/resource/Hedera', 
'http://dbpedia.org/resource/Helix', 
'http://dbpedia.org/resource/Bronchitis', 
'http://dbpedia.org/resource/Cough_medicine',
'http://dbpedia.org/resource/Hedera', 
'http://dbpedia.org/resource/After_7',
'http://dbpedia.org/resource/Gastrointestinal_tract',
'http://dbpedia.org/resource/Antibiotics',
'http://dbpedia.org/resource/Relative_risk',
'http://dbpedia.org/resource/Hedera',
'http://dbpedia.org/resource/Bronchitis']
正如你所看到的,结果不是很好

例如,在上面提到的文本中考虑<代码> Heorda螺旋提取< /代码>。尽管DBpedia有一个用于Hedera helix的资源(
http://dbpedia.org/resource/Hedera_helix
),聚光灯将其作为两个URI输出为
http://dbpedia.org/resource/Hedera
http://dbpedia.org/resource/Helix

根据我的数据集,我希望得到DBpedia中最长的术语作为结果。在这种情况下,我可以做哪些改进来获得所需的输出


如果需要,我很乐意提供更多细节。

虽然我回答这个问题的时间很晚,但您可以使用python中的Babelnet API来获取包含更长文本的dbpedia URI。我用下面的代码重现了这个问题:

`from babelpy.babelfy import BabelfyClient

text ="Tolerance, safety and efficacy of Hedera helix extract in inflammatory 
bronchial diseases under clinical practice conditions: a prospective, open, 
multicentre postmarketing study in 9657 patients.     In this postmarketing 
study 9657 patients (5181 children) with bronchitis (acute or chronic 
bronchial inflammatory disease) were treated with a syrup containing dried ivy 
leaf extract. After 7 days of therapy, 95% of the patients showed improvement 
or healing of their symptoms. The safety of the therapy was very good with an 
overall incidence of adverse events of 2.1% (mainly gastrointestinal disorders 
with 1.5%). In those patients who got concomitant medication as well, it could 
be shown that the additional application of antibiotics had no benefit 
respective to efficacy but did increase the relative risk for the occurrence 
of side effects by 26%. In conclusion, it is to say that the dried ivy leaf 
extract is effective and well tolerated in patients with bronchitis. In view 
of the large population considered, future analyses should approach specific 
issues concerning therapy by age group, concomitant therapy and baseline 
conditions."

# Instantiate BabelFy client.
params = dict()
params['lang'] = 'english'
babel_client = BabelfyClient("**Your Registration Code For API**", params)

# Babelfy sentence.
babel_client.babelfy(text)


# Get all merged entities.
babel_client.all_merged_entities'
文本中所有合并实体的输出格式如下所示。您可以进一步存储和处理字典结构以提取dbpediauri

{'start': 34,
'end': 45,
'text': 'Hedera helix',
'isEntity': True,
'tokenFragment': {'start': 6, 'end': 7},
'charFragment': {'start': 34, 'end': 45},
'babelSynsetID': 'bn:00021109n',
'DBpediaURL': 'http://dbpedia.org/resource/Hedera_helix',
'BabelNetURL': 'http://babelnet.org/rdf/s00021109n',
'score': 1.0,
'coherenceScore': 0.0847457627118644,
'globalScore': 0.0013494092960806407,
'source': 'BABELFY'},

对结果进行后期处理,或在自己的数据集上进行训练,或使用其他工具,甚至多个工具。用计算机解决这个问题不是小事general@AKSW谢谢你的评论。您对我可以尝试的其他工具或我在这方面可以使用的任何后处理技术有什么建议吗。我期待着收到你的来信。非常感谢:)不,那是NLP,不是我的主题。名词短语检测,然后链接到DBpedia,这是您的角落案例所需要的。和往常一样,角落案例可能很棘手,NLP从基本步骤开始,如句子检测、词性标注、NP检测等等。因此,任何以前的错误都会影响以后的结果steps@AKSW谢谢。当然,我会看看您提到的领域:)
pyspotlight
可能会引起兴趣。虽然这可能不会提高识别率,但至少您可以编写更少的代码。它还返回比上述代码更多的结果。