如何使用python和rdflib从rdf转储解包dmoz URL?
我试图打开rdf文件dmoz rdf dump,但收到此错误消息如何使用python和rdflib从rdf转储解包dmoz URL?,python,rdf,rdflib,dmoz,Python,Rdf,Rdflib,Dmoz,我试图打开rdf文件dmoz rdf dump,但收到此错误消息 Traceback (most recent call last): File "/media/_dev_/ODP_RDF_get_links.py", line 4, in <module> result = g.parse("data/content.rdf") File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 1
Traceback (most recent call last):
File "/media/_dev_/ODP_RDF_get_links.py", line 4, in <module>
result = g.parse("data/content.rdf")
File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 1033, in parse
parser.parse(source, self, **args)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 577, in parse
self._parser.parse(source)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 210, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 352, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 160, in endElementNS
self.current.end(name, qname)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 331, in node_element_end
self.error("Repeat node-elements inside property elements: %s"%"".join(name))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 185, in error
raise ParserError(info + message)
file:///media/_dev_/data/content.rdf:5:12: Repeat node-elements inside property elements: http://dmoz.org/rdf/catid
我需要能够读取文件。
提取世界类别中的所有链接。
谢谢你的帮助
编辑:
PS:找到了这个,所以开发自定义脚本对于使用这个转储是必要的您得到了什么RDF?它看起来像是一个解析错误,所以可能它实际上不是合法的RDF。如果有其他方法,那就太好了。可能是真的,它是RDF.dmoz.org/RDF中的content.RDF.u8.gz文件,但这个文件是从ODP获取链接的唯一方法!
import rdflib
g = rdflib.Graph()
result = g.parse("data/content.rdf")
print("graph has %s statements." % len(g))