Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 元数据获取_Python_Metadata_Oai - Fatal编程技术网

Python 元数据获取

Python 元数据获取,python,metadata,oai,Python,Metadata,Oai,我正在尝试使用元数据获取包获取此站点上的数据 我在pyaoi网站上尝试了这个例子,但没有成功。当我测试它时,我得到一个错误。代码是: from oaipmh.client import Client from oaipmh.metadata import MetadataRegistry, oai_dc_reader URL = 'http://uni.edu/ir/oaipmh' registry = MetadataRegistry() registry.registerReader('o

我正在尝试使用元数据获取包获取此站点上的数据

我在pyaoi网站上尝试了这个例子,但没有成功。当我测试它时,我得到一个错误。代码是:

from oaipmh.client import Client
from oaipmh.metadata import MetadataRegistry, oai_dc_reader

URL = 'http://uni.edu/ir/oaipmh'
registry = MetadataRegistry()
registry.registerReader('oai_dc', oai_dc_reader)
client = Client(URL, registry)

for record in client.listRecords(metadataPrefix='oai_dc'):
    print record
这是堆栈跟踪:

Traceback (most recent call last):
  File "/Users/arashsaidi/PycharmProjects/get-new-DUO/get-files.py", line 8, in <module>
    for record in client.listRecords(metadataPrefix='oai_dc'):
  File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/common.py", line 115, in method
    return obj(self, **kw)
  File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/common.py", line 110, in __call__
    return bound_self.handleVerb(self._verb, kw)
  File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/client.py", line 65, in handleVerb
    kw, self.makeRequestErrorHandling(verb=verb, **kw))    
  File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/client.py", line 273, in makeRequestErrorHandling
    raise error.XMLSyntaxError(kw)
oaipmh.error.XMLSyntaxError: {'verb': 'ListRecords', 'metadataPrefix': 'oai_dc'}
回溯(最近一次呼叫最后一次):
文件“/Users/arashsaidi/PycharmProjects/getnewduo/getfiles.py”,第8行,在
对于client.listRecords(metadataPrefix='oai_dc')中的记录:
文件“/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site packages/oaipmh/common.py”,第115行,在方法中
返回obj(自身,**千瓦)
文件“/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site packages/oaipmh/common.py”,第110行,在调用中__
返回绑定的self.handlerb(self.\u动词,kw)
文件“/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site packages/oaipmh/client.py”,第65行,在handleVerb中
kw,self.makeRequestErrorHandling(动词=动词,**kw))
文件“/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site packages/oaipmh/client.py”,第273行,在makeRequestErrorHandling中
引发错误。XMLSyntaxError(千瓦)
oaipmh.error.XMLSyntaxError:{'verb':'ListRecords','metadataPrefix':'oai_dc'}
我需要访问上面链接到的页面上的所有文件,并生成一个包含一些元数据的附加文件


有什么建议吗?

来自pyoai站点()的链接似乎已失效,因为它返回404。
然而,您应该能够从您的网站获得如下数据:

from oaipmh.client import Client
from oaipmh.metadata import MetadataRegistry, oai_dc_reader

URL = 'https://www.duo.uio.no/oai/request'
registry = MetadataRegistry()
registry.registerReader('oai_dc', oai_dc_reader)
client = Client(URL, registry)

# identify info
identify = client.identify()
print "Repository name: {0}".format(identify.repositoryName())
print "Base URL: {0}".format(identify.baseURL())
print "Protocol version: {0}".format(identify.protocolVersion())
print "Granularity: {0}".format(identify.granularity())
print "Compression: {0}".format(identify.compression())
print "Deleted record: {0}".format(identify.deletedRecord())

# list records
records = client.listRecords(metadataPrefix='oai_dc')
for record in records:
    # do something with the record
    pass

# list metadata formats
formats = client.listMetadataFormats()
for f in formats:
    # do something with f
    pass

我最终使用了镰刀软件包,我发现它有更好的文档和更易于使用:

此代码获取所有集合,然后从每个集合中检索每个记录。考虑到要处理的记录超过30000条,这似乎是最好的解决方案。对每一组进行此操作可以提供更多的控制。希望这能帮助其他人。我不知道为什么图书馆使用OAI,对我来说这似乎不是一个组织数据的好方法

# gets sickle from OAI
        sickle = Sickle('http://www.duo.uio.no/oai/request')
        sets = sickle.ListSets()  # gets all sets
        for recs in sets:
            for rec in recs:
                if rec[0] == 'setSpec':
                    try:
                        print rec[1][0], self.spec_list[rec[1][0]]
                        records = sickle.ListRecords(metadataPrefix='xoai', set=rec[1][0], ignore_deleted=True)
                        self.write_file_and_metadata()
                    except Exception as e:
                        # simple exception handling if not possible to retrieve record
                        print('Exception: {}'.format(e))