Rdf DBPedia本地服务器为不同的查询提供奇怪的结果
我正试图得到所有维基百科的人的名单,他们都有尽可能多的功能来解决一些机器学习问题 我已经设置了一个本地DBPedia服务器,并且已经增加了各种参数的限制,但不知何故,我仍然无法获得所需的结果 所需输出为以下格式的CSV fo:Rdf DBPedia本地服务器为不同的查询提供奇怪的结果,rdf,sparql,semantic-web,dbpedia,virtuoso,Rdf,Sparql,Semantic Web,Dbpedia,Virtuoso,我正试图得到所有维基百科的人的名单,他们都有尽可能多的功能来解决一些机器学习问题 我已经设置了一个本地DBPedia服务器,并且已经增加了各种参数的限制,但不知何故,我仍然无法获得所需的结果 所需输出为以下格式的CSV fo: <Person1>,<Feature1>,<Feature2>,<Feature3> .......... and so on <Person2>,<Feature1>,<Feature2>
<Person1>,<Feature1>,<Feature2>,<Feature3> .......... and so on
<Person2>,<Feature1>,<Feature2>,<Feature3> .......... and so on
<Person3>,<Feature1>,<Feature2>,<Feature3> .......... and so on
...and
...so
...on
结果:
[[姓名]][[出生日期]]
但是当我运行这个查询时,我只得到了50000
行数,这非常少
查询:
SELECT ?name ?birthDate WHERE {
{
SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(?
birthDate) as ?birthDate WHERE {
?person a <http://dbpedia.org/ontology/Person> .
?person dbpedia-owl:birthDate ?birthDate .
}
ORDER BY ASC(?name)
}
}
OFFSET 100000
LIMIT 500
SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(?birthDate)
as ?birthDate, str(?birthName) as ?birthName, strafter(str(?
occupation),"http://dbpedia.org/resource/") as ?occupation WHERE {
?person a <http://dbpedia.org/ontology/Person> .
?person dbpedia-owl:birthDate ?birthDate .
?person dbpedia-owl:birthName ?birthName .
?person dbpedia-owl:occupation ?occupation .
}
select ?s ?p ?o { ?s a dbpedia-owl:Person ; ?p ?o }
[Database]
DatabaseFile = /var/lib/virtuoso/db/virtuoso.db
ErrorLogFile = /var/lib/virtuoso/db/virtuoso.log
LockFile = /var/lib/virtuoso/db/virtuoso.lck
TransactionFile = /var/lib/virtuoso/db/virtuoso.trx
xa_persistent_file = /var/lib/virtuoso/db/virtuoso.pxa
ErrorLogLevel = 7
FileExtend = 200
;MaxCheckpointRemap = 2000
MaxCheckpointRemap = 1362500
Striping = 0
TempStorage = TempDatabase
[TempDatabase]
DatabaseFile = /var/lib/virtuoso/db/virtuoso-temp.db
TransactionFile = /var/lib/virtuoso/db/virtuoso-temp.trx
MaxCheckpointRemap = 2000
Striping = 0
[Parameters]
ServerPort = 1111
LiteMode = 0
DisableUnixSocket = 1
DisableTcpSocket = 0
;SSLServerPort = 2111
;SSLCertificate = cert.pem
;SSLPrivateKey = pk.pem
;X509ClientVerify = 0
;X509ClientVerifyDepth = 0
;X509ClientVerifyCAFile = ca.pem
ServerThreads = 20
CheckpointInterval = 60
O_DIRECT = 0
CaseMode = 2
MaxStaticCursorRows = 500000000
CheckpointAuditTrail = 0
AllowOSCalls = 0
SchedulerInterval = 10
DirsAllowed = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
ThreadCleanupInterval = 0
ThreadThreshold = 10
ResourcesCleanupInterval = 0
FreeTextBatchSize = 100000
SingleCPU = 0
VADInstallDir = /usr/share/virtuoso/vad/
PrefixResultNames = 0
RdfFreeTextRulesSize = 100
IndexTreeMaps = 256
MaxMemPoolSize = 200000000
PrefixResultNames = 0
MacSpotlight = 0
IndexTreeMaps = 64
MaxSortedTopRows = 100000000
;;
;; Uncomment next two lines if there is 64 GB system memory free
NumberOfBuffers = 5450000
MaxDirtyBuffers = 4000000
;;
[HTTPServer]
ServerPort = 8890
ServerRoot = /var/lib/virtuoso/vsp
ServerThreads = 20
DavRoot = DAV
EnabledDavVSP = 0
HTTPProxyEnabled = 0
TempASPXDir = 0
DefaultMailServer = localhost:25
ServerThreads = 10
MaxKeepAlives = 10
KeepAliveTimeout = 10
MaxCachedProxyConnections = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize = 280000
HttpPrintWarningsInOutput = 0
Charset = UTF-8
;HTTPLogFile = logs/http.log
[AutoRepair]
BadParentLinks = 0
[Client]
SQL_PREFETCH_ROWS = 100
SQL_PREFETCH_BYTES = 16000
SQL_QUERY_TIMEOUT = 0
SQL_TXN_TIMEOUT = 0
;SQL_NO_CHAR_C_ESCAPE = 1
;SQL_UTF8_EXECS = 0
;SQL_NO_SYSTEM_TABLES = 0
;SQL_BINARY_TIMESTAMP = 1
;SQL_ENCRYPTION_ON_PASSWORD = -1
[VDB]
ArrayOptimization = 0
NumArrayParameters = 10
VDBDisconnectTimeout = 1000
KeepConnectionOnFixedThread = 0
[Replication]
ServerName = db-IP-172-31-24-242
ServerEnable = 1
QueueMax = 5000000
[Striping]
Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
Segment2 = 100M, db-seg2-1.db
;...
[Zero Config]
ServerName = virtuoso (IP-172-31-24-242)
[URIQA]
DynamicLocal = 0
DefaultHost = localhost:8890
[SPARQL]
;ExternalQuerySource = 1
;ExternalXsltSource = 1
;DefaultGraph = http://localhost:8890/dataspace
;ImmutableGraphs = http://localhost:8890/dataspace
;ResultSetMaxRows = 10000
ResultSetMaxRows = 1000000000
;MaxQueryCostEstimationTime = 400 ; in seconds
MaxQueryCostEstimationTime = 4000000000000000 ; in seconds
;MaxQueryExecutionTime = 60 ; in seconds
MaxQueryExecutionTime = 600000000000000 ; in seconds
DefaultQuery = select distinct ?Concept where {[] a ?Concept} LIMIT
100
DeferInferenceRulesInit = 0 ; controls inference rules loading
;PingService = http://rpc.pingthesemanticweb.com/
MaxSortedTopRows = 10000000
[Plugins]
LoadPath = /usr/lib/virtuoso/hosting
Load1 = plain, wikiv
Load2 = plain, mediawiki
Load3 = plain, creolewiki
Load4 = plain, im
结果:
>
我的virtuoso.ini文件:
SELECT ?name ?birthDate WHERE {
{
SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(?
birthDate) as ?birthDate WHERE {
?person a <http://dbpedia.org/ontology/Person> .
?person dbpedia-owl:birthDate ?birthDate .
}
ORDER BY ASC(?name)
}
}
OFFSET 100000
LIMIT 500
SELECT strafter(str(?person),"http://dbpedia.org/resource/") as ?name, str(?birthDate)
as ?birthDate, str(?birthName) as ?birthName, strafter(str(?
occupation),"http://dbpedia.org/resource/") as ?occupation WHERE {
?person a <http://dbpedia.org/ontology/Person> .
?person dbpedia-owl:birthDate ?birthDate .
?person dbpedia-owl:birthName ?birthName .
?person dbpedia-owl:occupation ?occupation .
}
select ?s ?p ?o { ?s a dbpedia-owl:Person ; ?p ?o }
[Database]
DatabaseFile = /var/lib/virtuoso/db/virtuoso.db
ErrorLogFile = /var/lib/virtuoso/db/virtuoso.log
LockFile = /var/lib/virtuoso/db/virtuoso.lck
TransactionFile = /var/lib/virtuoso/db/virtuoso.trx
xa_persistent_file = /var/lib/virtuoso/db/virtuoso.pxa
ErrorLogLevel = 7
FileExtend = 200
;MaxCheckpointRemap = 2000
MaxCheckpointRemap = 1362500
Striping = 0
TempStorage = TempDatabase
[TempDatabase]
DatabaseFile = /var/lib/virtuoso/db/virtuoso-temp.db
TransactionFile = /var/lib/virtuoso/db/virtuoso-temp.trx
MaxCheckpointRemap = 2000
Striping = 0
[Parameters]
ServerPort = 1111
LiteMode = 0
DisableUnixSocket = 1
DisableTcpSocket = 0
;SSLServerPort = 2111
;SSLCertificate = cert.pem
;SSLPrivateKey = pk.pem
;X509ClientVerify = 0
;X509ClientVerifyDepth = 0
;X509ClientVerifyCAFile = ca.pem
ServerThreads = 20
CheckpointInterval = 60
O_DIRECT = 0
CaseMode = 2
MaxStaticCursorRows = 500000000
CheckpointAuditTrail = 0
AllowOSCalls = 0
SchedulerInterval = 10
DirsAllowed = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
ThreadCleanupInterval = 0
ThreadThreshold = 10
ResourcesCleanupInterval = 0
FreeTextBatchSize = 100000
SingleCPU = 0
VADInstallDir = /usr/share/virtuoso/vad/
PrefixResultNames = 0
RdfFreeTextRulesSize = 100
IndexTreeMaps = 256
MaxMemPoolSize = 200000000
PrefixResultNames = 0
MacSpotlight = 0
IndexTreeMaps = 64
MaxSortedTopRows = 100000000
;;
;; Uncomment next two lines if there is 64 GB system memory free
NumberOfBuffers = 5450000
MaxDirtyBuffers = 4000000
;;
[HTTPServer]
ServerPort = 8890
ServerRoot = /var/lib/virtuoso/vsp
ServerThreads = 20
DavRoot = DAV
EnabledDavVSP = 0
HTTPProxyEnabled = 0
TempASPXDir = 0
DefaultMailServer = localhost:25
ServerThreads = 10
MaxKeepAlives = 10
KeepAliveTimeout = 10
MaxCachedProxyConnections = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize = 280000
HttpPrintWarningsInOutput = 0
Charset = UTF-8
;HTTPLogFile = logs/http.log
[AutoRepair]
BadParentLinks = 0
[Client]
SQL_PREFETCH_ROWS = 100
SQL_PREFETCH_BYTES = 16000
SQL_QUERY_TIMEOUT = 0
SQL_TXN_TIMEOUT = 0
;SQL_NO_CHAR_C_ESCAPE = 1
;SQL_UTF8_EXECS = 0
;SQL_NO_SYSTEM_TABLES = 0
;SQL_BINARY_TIMESTAMP = 1
;SQL_ENCRYPTION_ON_PASSWORD = -1
[VDB]
ArrayOptimization = 0
NumArrayParameters = 10
VDBDisconnectTimeout = 1000
KeepConnectionOnFixedThread = 0
[Replication]
ServerName = db-IP-172-31-24-242
ServerEnable = 1
QueueMax = 5000000
[Striping]
Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
Segment2 = 100M, db-seg2-1.db
;...
[Zero Config]
ServerName = virtuoso (IP-172-31-24-242)
[URIQA]
DynamicLocal = 0
DefaultHost = localhost:8890
[SPARQL]
;ExternalQuerySource = 1
;ExternalXsltSource = 1
;DefaultGraph = http://localhost:8890/dataspace
;ImmutableGraphs = http://localhost:8890/dataspace
;ResultSetMaxRows = 10000
ResultSetMaxRows = 1000000000
;MaxQueryCostEstimationTime = 400 ; in seconds
MaxQueryCostEstimationTime = 4000000000000000 ; in seconds
;MaxQueryExecutionTime = 60 ; in seconds
MaxQueryExecutionTime = 600000000000000 ; in seconds
DefaultQuery = select distinct ?Concept where {[] a ?Concept} LIMIT
100
DeferInferenceRulesInit = 0 ; controls inference rules loading
;PingService = http://rpc.pingthesemanticweb.com/
MaxSortedTopRows = 10000000
[Plugins]
LoadPath = /usr/lib/virtuoso/hosting
Load1 = plain, wikiv
Load2 = plain, mediawiki
Load3 = plain, creolewiki
Load4 = plain, im
请告诉我,以防我遗漏了一些琐碎的东西,但这些查询的结果对我来说毫无意义 很难确定您的确切问题是什么,因为您正在执行许多完全不同的查询。如果你想找出原因,最好的办法是做一些小的改变 另外:您的所有查询在语法上都是非法的SPARQL,这使得很难判断出哪里出了问题。特别是,您表述“AS”别名的方式是不正确的——首先,它们应该被括在括号中,其次,您不应该为已经存在的变量使用别名。例如,而不是类似于:
str(?birthDate) as ?birthDate
(str(?birthDate) as ?bd)
你应该这样做:
str(?birthDate) as ?birthDate
(str(?birthDate) as ?bd)
除此之外,在第一次查询中,您将偏移量设置为100000。很可能,你没有得到任何答案仅仅是因为结果少于100000个
在第二个查询中,您将得到50000个结果,这大概准确地反映了符合您的条件的实际人数。同样,当您尝试使用“AS”别名命令将变量“重新绑定”到新值时,查询有点奇怪
最后,最后一个查询只检索关于Person
类型的资源的所有三元组。这并不奇怪,这个结果要大得多,因为您没有施加进一步的约束。结果中的每一行都是特定人员的一个属性值组合
我建议您看看基本的SPARQL教程,因为我认为您可能缺少一些基础知识。SPARQL需要一点时间来适应,但是一旦您了解了基础知识(比如图形模式匹配的实际含义),您就会发现编写自己的查询要容易得多