Rdf DBPedia本地服务器为不同的查询提供奇怪的结果

Rdf DBPedia本地服务器为不同的查询提供奇怪的结果,rdf,sparql,semantic-web,dbpedia,virtuoso,Rdf,Sparql,Semantic Web,Dbpedia,Virtuoso,我正试图得到所有维基百科的人的名单,他们都有尽可能多的功能来解决一些机器学习问题 我已经设置了一个本地DBPedia服务器,并且已经增加了各种参数的限制,但不知何故,我仍然无法获得所需的结果 所需输出为以下格式的CSV fo: <Person1>,<Feature1>,<Feature2>,<Feature3> .......... and so on <Person2>,<Feature1>,<Feature2>

我正试图得到所有维基百科的人的名单,他们都有尽可能多的功能来解决一些机器学习问题

我已经设置了一个本地DBPedia服务器,并且已经增加了各种参数的限制,但不知何故,我仍然无法获得所需的结果

所需输出为以下格式的CSV fo:

<Person1>,<Feature1>,<Feature2>,<Feature3> .......... and so on
<Person2>,<Feature1>,<Feature2>,<Feature3> .......... and so on
<Person3>,<Feature1>,<Feature2>,<Feature3> .......... and so on
 ...and
 ...so
 ...on
结果: [[姓名]][[出生日期]]

但是当我运行这个查询时,我只得到了
50000
行数,这非常少

查询:

 SELECT  ?name ?birthDate WHERE {
   {
      SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name,  str(?
   birthDate) as ?birthDate WHERE {
      ?person a <http://dbpedia.org/ontology/Person> .
      ?person dbpedia-owl:birthDate ?birthDate .

 }
      ORDER BY ASC(?name) 
  }
} 

 OFFSET 100000
 LIMIT 500
  SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(?birthDate) 
  as ?birthDate, str(?birthName) as ?birthName, strafter(str(?
  occupation),"http://dbpedia.org/resource/") as ?occupation WHERE {
      ?person a <http://dbpedia.org/ontology/Person> .
      ?person dbpedia-owl:birthDate ?birthDate .
      ?person dbpedia-owl:birthName ?birthName .
      ?person dbpedia-owl:occupation ?occupation .

  }
  select ?s ?p ?o { ?s a dbpedia-owl:Person ; ?p ?o }
[Database]
DatabaseFile                    = /var/lib/virtuoso/db/virtuoso.db
ErrorLogFile                    = /var/lib/virtuoso/db/virtuoso.log
LockFile                        = /var/lib/virtuoso/db/virtuoso.lck
TransactionFile                 = /var/lib/virtuoso/db/virtuoso.trx
xa_persistent_file              = /var/lib/virtuoso/db/virtuoso.pxa
ErrorLogLevel                   = 7
FileExtend                      = 200
;MaxCheckpointRemap             = 2000
MaxCheckpointRemap              = 1362500
Striping                        = 0
TempStorage                     = TempDatabase


[TempDatabase]
DatabaseFile                    = /var/lib/virtuoso/db/virtuoso-temp.db
TransactionFile                 = /var/lib/virtuoso/db/virtuoso-temp.trx
MaxCheckpointRemap              = 2000
Striping                        = 0

[Parameters]
ServerPort                      = 1111
LiteMode                        = 0
DisableUnixSocket               = 1
DisableTcpSocket                = 0
;SSLServerPort                  = 2111
;SSLCertificate                 = cert.pem
;SSLPrivateKey                  = pk.pem
;X509ClientVerify               = 0
;X509ClientVerifyDepth          = 0
;X509ClientVerifyCAFile         = ca.pem
ServerThreads                   = 20
CheckpointInterval              = 60
O_DIRECT                        = 0
CaseMode                        = 2
MaxStaticCursorRows             = 500000000
CheckpointAuditTrail            = 0
AllowOSCalls                    = 0
SchedulerInterval               = 10
DirsAllowed                     = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
ThreadCleanupInterval           = 0
ThreadThreshold                 = 10
ResourcesCleanupInterval        = 0
FreeTextBatchSize               = 100000
SingleCPU                       = 0
VADInstallDir                   = /usr/share/virtuoso/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize            = 100
IndexTreeMaps                   = 256
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
IndexTreeMaps                   = 64
MaxSortedTopRows                = 100000000
;;


;; Uncomment next two lines if there is 64 GB system memory free
NumberOfBuffers          = 5450000
MaxDirtyBuffers          = 4000000
;;

[HTTPServer]
ServerPort                      = 8890
ServerRoot                      = /var/lib/virtuoso/vsp
ServerThreads                   = 20
DavRoot                         = DAV
EnabledDavVSP                   = 0
HTTPProxyEnabled                = 0
TempASPXDir                     = 0
DefaultMailServer               = localhost:25
ServerThreads                   = 10
MaxKeepAlives                   = 10
KeepAliveTimeout                = 10
MaxCachedProxyConnections       = 10
ProxyConnectionCacheTimeout     = 15
HTTPThreadSize                  = 280000
HttpPrintWarningsInOutput       = 0
Charset                         = UTF-8
;HTTPLogFile                    = logs/http.log

[AutoRepair]
BadParentLinks                  = 0


[Client]
SQL_PREFETCH_ROWS               = 100
SQL_PREFETCH_BYTES              = 16000
SQL_QUERY_TIMEOUT               = 0
SQL_TXN_TIMEOUT                 = 0  
;SQL_NO_CHAR_C_ESCAPE           = 1
;SQL_UTF8_EXECS                 = 0
;SQL_NO_SYSTEM_TABLES           = 0
;SQL_BINARY_TIMESTAMP           = 1
;SQL_ENCRYPTION_ON_PASSWORD     = -1

[VDB]
ArrayOptimization               = 0
NumArrayParameters              = 10
VDBDisconnectTimeout            = 1000
KeepConnectionOnFixedThread     = 0

[Replication]
ServerName                      = db-IP-172-31-24-242
ServerEnable                    = 1
QueueMax                        = 5000000


[Striping]
Segment1                        = 100M, db-seg1-1.db, db-seg1-2.db
Segment2                        = 100M, db-seg2-1.db
;...


[Zero Config]
ServerName                      = virtuoso (IP-172-31-24-242)

[URIQA]
DynamicLocal                    = 0
DefaultHost                     = localhost:8890


[SPARQL]
;ExternalQuerySource            = 1
;ExternalXsltSource             = 1
;DefaultGraph                   = http://localhost:8890/dataspace
;ImmutableGraphs                = http://localhost:8890/dataspace
;ResultSetMaxRows               = 10000
ResultSetMaxRows                = 1000000000
;MaxQueryCostEstimationTime     = 400   ; in seconds
MaxQueryCostEstimationTime      = 4000000000000000      ; in seconds
;MaxQueryExecutionTime          = 60    ; in seconds
MaxQueryExecutionTime           = 600000000000000       ; in seconds
DefaultQuery                    = select distinct ?Concept where {[] a ?Concept} LIMIT 
100
DeferInferenceRulesInit         = 0  ; controls inference rules loading
;PingService                    = http://rpc.pingthesemanticweb.com/
MaxSortedTopRows                = 10000000

[Plugins]
LoadPath                        = /usr/lib/virtuoso/hosting
Load1                           = plain, wikiv
Load2                           = plain, mediawiki
Load3                           = plain, creolewiki
Load4                   = plain, im
结果: >

我的virtuoso.ini文件:

 SELECT  ?name ?birthDate WHERE {
   {
      SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name,  str(?
   birthDate) as ?birthDate WHERE {
      ?person a <http://dbpedia.org/ontology/Person> .
      ?person dbpedia-owl:birthDate ?birthDate .

 }
      ORDER BY ASC(?name) 
  }
} 

 OFFSET 100000
 LIMIT 500
  SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(?birthDate) 
  as ?birthDate, str(?birthName) as ?birthName, strafter(str(?
  occupation),"http://dbpedia.org/resource/") as ?occupation WHERE {
      ?person a <http://dbpedia.org/ontology/Person> .
      ?person dbpedia-owl:birthDate ?birthDate .
      ?person dbpedia-owl:birthName ?birthName .
      ?person dbpedia-owl:occupation ?occupation .

  }
  select ?s ?p ?o { ?s a dbpedia-owl:Person ; ?p ?o }
[Database]
DatabaseFile                    = /var/lib/virtuoso/db/virtuoso.db
ErrorLogFile                    = /var/lib/virtuoso/db/virtuoso.log
LockFile                        = /var/lib/virtuoso/db/virtuoso.lck
TransactionFile                 = /var/lib/virtuoso/db/virtuoso.trx
xa_persistent_file              = /var/lib/virtuoso/db/virtuoso.pxa
ErrorLogLevel                   = 7
FileExtend                      = 200
;MaxCheckpointRemap             = 2000
MaxCheckpointRemap              = 1362500
Striping                        = 0
TempStorage                     = TempDatabase


[TempDatabase]
DatabaseFile                    = /var/lib/virtuoso/db/virtuoso-temp.db
TransactionFile                 = /var/lib/virtuoso/db/virtuoso-temp.trx
MaxCheckpointRemap              = 2000
Striping                        = 0

[Parameters]
ServerPort                      = 1111
LiteMode                        = 0
DisableUnixSocket               = 1
DisableTcpSocket                = 0
;SSLServerPort                  = 2111
;SSLCertificate                 = cert.pem
;SSLPrivateKey                  = pk.pem
;X509ClientVerify               = 0
;X509ClientVerifyDepth          = 0
;X509ClientVerifyCAFile         = ca.pem
ServerThreads                   = 20
CheckpointInterval              = 60
O_DIRECT                        = 0
CaseMode                        = 2
MaxStaticCursorRows             = 500000000
CheckpointAuditTrail            = 0
AllowOSCalls                    = 0
SchedulerInterval               = 10
DirsAllowed                     = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
ThreadCleanupInterval           = 0
ThreadThreshold                 = 10
ResourcesCleanupInterval        = 0
FreeTextBatchSize               = 100000
SingleCPU                       = 0
VADInstallDir                   = /usr/share/virtuoso/vad/
PrefixResultNames               = 0
RdfFreeTextRulesSize            = 100
IndexTreeMaps                   = 256
MaxMemPoolSize                  = 200000000
PrefixResultNames               = 0
MacSpotlight                    = 0
IndexTreeMaps                   = 64
MaxSortedTopRows                = 100000000
;;


;; Uncomment next two lines if there is 64 GB system memory free
NumberOfBuffers          = 5450000
MaxDirtyBuffers          = 4000000
;;

[HTTPServer]
ServerPort                      = 8890
ServerRoot                      = /var/lib/virtuoso/vsp
ServerThreads                   = 20
DavRoot                         = DAV
EnabledDavVSP                   = 0
HTTPProxyEnabled                = 0
TempASPXDir                     = 0
DefaultMailServer               = localhost:25
ServerThreads                   = 10
MaxKeepAlives                   = 10
KeepAliveTimeout                = 10
MaxCachedProxyConnections       = 10
ProxyConnectionCacheTimeout     = 15
HTTPThreadSize                  = 280000
HttpPrintWarningsInOutput       = 0
Charset                         = UTF-8
;HTTPLogFile                    = logs/http.log

[AutoRepair]
BadParentLinks                  = 0


[Client]
SQL_PREFETCH_ROWS               = 100
SQL_PREFETCH_BYTES              = 16000
SQL_QUERY_TIMEOUT               = 0
SQL_TXN_TIMEOUT                 = 0  
;SQL_NO_CHAR_C_ESCAPE           = 1
;SQL_UTF8_EXECS                 = 0
;SQL_NO_SYSTEM_TABLES           = 0
;SQL_BINARY_TIMESTAMP           = 1
;SQL_ENCRYPTION_ON_PASSWORD     = -1

[VDB]
ArrayOptimization               = 0
NumArrayParameters              = 10
VDBDisconnectTimeout            = 1000
KeepConnectionOnFixedThread     = 0

[Replication]
ServerName                      = db-IP-172-31-24-242
ServerEnable                    = 1
QueueMax                        = 5000000


[Striping]
Segment1                        = 100M, db-seg1-1.db, db-seg1-2.db
Segment2                        = 100M, db-seg2-1.db
;...


[Zero Config]
ServerName                      = virtuoso (IP-172-31-24-242)

[URIQA]
DynamicLocal                    = 0
DefaultHost                     = localhost:8890


[SPARQL]
;ExternalQuerySource            = 1
;ExternalXsltSource             = 1
;DefaultGraph                   = http://localhost:8890/dataspace
;ImmutableGraphs                = http://localhost:8890/dataspace
;ResultSetMaxRows               = 10000
ResultSetMaxRows                = 1000000000
;MaxQueryCostEstimationTime     = 400   ; in seconds
MaxQueryCostEstimationTime      = 4000000000000000      ; in seconds
;MaxQueryExecutionTime          = 60    ; in seconds
MaxQueryExecutionTime           = 600000000000000       ; in seconds
DefaultQuery                    = select distinct ?Concept where {[] a ?Concept} LIMIT 
100
DeferInferenceRulesInit         = 0  ; controls inference rules loading
;PingService                    = http://rpc.pingthesemanticweb.com/
MaxSortedTopRows                = 10000000

[Plugins]
LoadPath                        = /usr/lib/virtuoso/hosting
Load1                           = plain, wikiv
Load2                           = plain, mediawiki
Load3                           = plain, creolewiki
Load4                   = plain, im

请告诉我,以防我遗漏了一些琐碎的东西,但这些查询的结果对我来说毫无意义

很难确定您的确切问题是什么,因为您正在执行许多完全不同的查询。如果你想找出原因,最好的办法是做一些小的改变

另外:您的所有查询在语法上都是非法的SPARQL,这使得很难判断出哪里出了问题。特别是,您表述“AS”别名的方式是不正确的——首先,它们应该被括在括号中,其次,您不应该为已经存在的变量使用别名。例如,而不是类似于:

str(?birthDate) as ?birthDate
(str(?birthDate) as ?bd)
你应该这样做:

str(?birthDate) as ?birthDate
(str(?birthDate) as ?bd)
除此之外,在第一次查询中,您将偏移量设置为100000。很可能,你没有得到任何答案仅仅是因为结果少于100000个

在第二个查询中,您将得到50000个结果,这大概准确地反映了符合您的条件的实际人数。同样,当您尝试使用“AS”别名命令将变量“重新绑定”到新值时,查询有点奇怪

最后,最后一个查询只检索关于
Person
类型的资源的所有三元组。这并不奇怪,这个结果要大得多,因为您没有施加进一步的约束。结果中的每一行都是特定人员的一个属性值组合

我建议您看看基本的SPARQL教程,因为我认为您可能缺少一些基础知识。SPARQL需要一点时间来适应,但是一旦您了解了基础知识(比如图形模式匹配的实际含义),您就会发现编写自己的查询要容易得多