文本中实体引用的rdf表示_Rdf_Named Entity Extraction

文本中实体引用的rdf表示

rdf

文本中实体引用的rdf表示,rdf,named-entity-extraction,Rdf,Named Entity Extraction,考虑这样一句话：约翰·史密斯去了华盛顿在好日子里，一个名字标记者会把“约翰·史密斯”当作一个人，把“华盛顿”当作一个地方。然而，如果没有其他证据，它无法分辨世界上所有可能的“约翰·史密斯”中的哪一个，或者甚至是各种各样的“华盛顿”中的哪一个最终，一些解决过程可能会根据其他证据做出决定。然而，在此之前，在RDF中表示这些引用的良好实践是什么？在某个命名空间中为它们分配由唯一标识符组成的标识符？创建空元组（例如“文档d中引用了一个名为John Smith的人”）？还有别的选择吗？我在一本书中给

考虑这样一句话：

约翰·史密斯去了华盛顿

在好日子里，一个名字标记者会把“约翰·史密斯”当作一个人，把“华盛顿”当作一个地方。然而，如果没有其他证据，它无法分辨世界上所有可能的“约翰·史密斯”中的哪一个，或者甚至是各种各样的“华盛顿”中的哪一个

最终，一些解决过程可能会根据其他证据做出决定。然而，在此之前，在RDF中表示这些引用的良好实践是什么？在某个命名空间中为它们分配由唯一标识符组成的标识符？创建空元组（例如“文档d中引用了一个名为John Smith的人”）？还有别的选择吗？我在一本书中给出了一个涉及匿名气象站的示例，但我不太明白它们的示例如何与所描述的RDF的所有其他内容相适应。

在您自己的名称空间中为它们分配唯一标识符。如果您后来发现此“Washington”与.，或其他内容相同，您可以添加owl:sameAs来声明这一点。

首先，您可以使用现有的良好服务进行实体识别，例如，和

更具体地说，是的，只需为每件事“造出”您自己的URI（标识符），然后谈论它们——在turtle中提供此信息的表示

@prefix : <http://yourdomain.com/data/> .
@prefix myont: <http://yourdomain.com/ontology/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-owl: <http://dbpedia.org/ontology/Place>.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

:John_Smith#d rdf:type foaf:Person ;
  foaf:name "John Smith"@en .

:Washington#d rdf:type dbpedia-owl:Place ;
  rdfs:label "Washington"@en .

:John_Smith#d myont:travelled_to :Washington#d .

<http://yourdomain.com/some-doc#this> rdf:type foaf:Document ;
  dcterms:references :John_Smith#d, :Washington#d .

@前缀：。
@前缀myont:。
@前缀dcterms:。
@前缀dbpedia owl:。
@前缀foaf:。
@前缀rdf:。
@前缀rdfs:。
：John_Smith d rdf:type foaf:Person；
foaf:name“John Smith”@en。
：Washington#d rdf:type dbpedia owl:Place；
rdfs:label“Washington”@en。
：约翰·史密斯先生：去华盛顿旅游。
rdf：类型foaf：文档；
术语：参考文献：约翰·史密斯，华盛顿。

如果您稍后匹配它们，那么您可以使用glenn mcdonald提到的owl:sameAs。

您可以按照上面的讨论创建自己的URI，或者使用空白节点。这两种方法各有利弊：

URI有一个外部标识，因此您可以在将来的查询中显式引用您的概念，这可以使一些查询更简单；但是，它们有一个外部标识，因此用于构造URI的算法成为基础设施的关键部分，并且必须保证它们既稳定又唯一。一开始这可能很琐碎，但当您开始处理在不同时间（通常是并行的）在分布式系统上重新处理的多个文档时，它很快就不再是直截了当的了

空白节点专门用于解决此问题，其唯一性由其作用域保证；但是，如果需要在查询中显式引用空白节点，则需要使用非标准扩展，或者找到某种方法来描述该节点

在这两种情况下，尤其是在使用空节点时，无论如何都应该包含出处语句来描述它

@内森的例子很好地说明了这一点

因此，使用空白节点的示例可能是：

@prefix my: <http://yourdomain.com/2010/07/20/conceptmap#> . @prefix proc: <http://yourdomain.com/2010/07/20/processing#> . @prefix prg: <http://yourdomain.com/processors#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.example.org/> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix doc: <http://yourdomain.com/doc-path/> . _:1 rdf:type proc:ProcessRun ; proc:parser prg:tagger ; proc:version "1.0.2" ; proc:time "2010-07-03 20:35:45"^^<xsd:Timestamp> ; proc:host prg:hostname-of-processing-node ; proc:file doc:some-doc#line=1,;md5=md5_sum_goes_here,mime-charset_goes_here ; _:2 rdf:type foaf:Person ; foaf:name "John Smith"@en ; proc:identifiedBy _:1 ; proc:atLocation doc:some-doc#char=0,9 . _:3 rdf:type owl:Thing ; foaf:name "Washington"@en ; proc:identifiedBy _:1 ; proc:atLocation doc:some-doc#char=24,33 . <http://yourdomain.com/some-doc#this> rdf:type foaf:Document ; dcterms:references _:2, _:3 . @前缀my:。 @前缀进程：。 @前缀prg:。 @前缀rdf:。 @前缀rdfs:。 @前缀xsd:。 @前缀dcterms:。 @前缀foaf:。 @前缀文件：。 _：1 rdf:type proc:ProcessRun； proc:parser prg:tagger；过程：版本“1.0.2”；程序：时间“2010-07-03 20:35:45”； proc:host prg：处理节点的主机名； proc:file doc:some doc#line=1，；md5=md5，这里是sum，这里是mime-charset； _：2 rdf:类型foaf:个人； foaf:name“John Smith”@en；过程：由u1标识； proc:atLocation doc:some doc#char=0,9。 _：3 rdf:type owl:Thing； foaf:name“Washington”@en；过程：由u1标识； proc:atLocation doc:some doc#char=24,33。 rdf：类型foaf：文档； dcterms:引用u2，3。请注意，使用rfc5147文本/纯片段标识符来唯一标识正在处理的文件，这为您提供了如何标识单个运行的灵活性。另一种方法是在文档根的URI中捕获所有这些内容，或者完全放弃出处

@prefix : <http://yourdomain.com/ProcessRun/parser=tagger/version=1.0.2/time=2010-07-03+20:35:45/host=hostname-of-processing-node/file=http%3A%2F%2Fyourdomain.com%2Fdoc-path%2Fsome-doc%23line%3D1%2C%3Bmd5%3Dmd5_sum_goes_here%2Cmime-charset_goes_here/$gt; . @prefix my: <http://yourdomain.com/2010/07/20/conceptmap#> . @prefix proc: <http://yourdomain.com/2010/07/20/processing#> . @prefix prg: <http://yourdomain.com/processors#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.example.org/> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix doc: <http://yourdomain.com/doc-path/some-doc#> . :1 rdf:type proc:ProcessRun ; proc:parser prg:tagger ; proc:version "1.0.2" ; proc:time "2010-07-03 20:35:45"^^<xsd:Timestamp> ; proc:host prg:hostname-of-processing-node ; proc:file doc:some-doc#line=1,;md5=md5_sum_goes_here,mime-charset_goes_here ; :2 rdf:type foaf:Person ; foaf:name "John Smith"@en ; proc:identifiedBy :1 ; proc:atLocation doc:some-doc#char=0,9 . :3 rdf:type owl:Thing ; foaf:name "Washington"@en ; proc:identifiedBy :1 ; proc:atLocation doc:some-doc#char=24,33 . <http://yourdomain.com/some-doc#this> rdf:type foaf:Document ; dcterms:references :2, :3 .

@前缀：可能与您了解Apache Stanbol的工作方式有关：

并为每个数据段创建名称空间？我想为整个数据集创建一个名称空间。然后是你不知道的唯一ID。NEE是我的事。我只是想增加RDF生产。 @prefix : <http://yourdomain.com/entities#> . @prefix my: <http://yourdomain.com/2010/07/20/conceptmap#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . :filename_timestamp_1 rdf:type foaf:Person ; foaf:name "John Smith"@en . :filename_timestamp_2 rdf:type owl:Thing ; foaf:name "Washington"@en . <http://yourdomain.com/some-doc#this> rdf:type foaf:Document ; dcterms:references :2, :3 .