Python 2.7 从CMS NPI数据查询JSON数据
在这上面敲上几天,可能需要一个叫醒电话! CMS(医疗保险和医疗补助服务中心)提供了一个API,用于根据个人的NPI(国家提供者标识符)访问医生信息 这里有很多信息,包括megafile每月下载等,但我不需要这些。我只需要对预先限定的单个NPI发出一个查询(低容量),并从检索到的记录中返回一些值 下面是一个随机选择的NPI的示例查询-- 如果在浏览器窗口中运行此操作,您将看到结果JSON数据封装在一些页眉/页脚HTML中 我可以转储整个查询结果并以几种不同的方式打印,但无法选择和打印特定的数据元素,如姓名、地址或电话号码。如果在浏览器中运行查询,则可以看到原始输出,下面的代码段将打印结果的净化版本。见下文。想法Python 2.7 从CMS NPI数据查询JSON数据,python-2.7,api,Python 2.7,Api,在这上面敲上几天,可能需要一个叫醒电话! CMS(医疗保险和医疗补助服务中心)提供了一个API,用于根据个人的NPI(国家提供者标识符)访问医生信息 这里有很多信息,包括megafile每月下载等,但我不需要这些。我只需要对预先限定的单个NPI发出一个查询(低容量),并从检索到的记录中返回一些值 下面是一个随机选择的NPI的示例查询-- 如果在浏览器窗口中运行此操作,您将看到结果JSON数据封装在一些页眉/页脚HTML中 我可以转储整个查询结果并以几种不同的方式打印,但无法选择和打印特定的数据
import urllib
from bs4 import BeautifulSoup
import json
def main():
url = "https://npiregistry.cms.hhs.gov/api/resultsDemo2/?number=1881761864&pretty=on"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html,"lxml")
for script in soup(["script", "style"]):
script.extract()
practitioner_rec = soup.get_text()
# strip out the html to retain the JSON record
lines = (line.strip() for line in practitioner_rec.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
practitioner_rec = '\n'.join(chunk for chunk in chunks if chunk)
# get a count of lines generated by the query, valid queries are greater than 3 lines long
linect = practitioner_rec.count('\n') +1
if linect == 3:
VALID_NPI="FALSE"
VALID_MD="FALSE"
else: # approx. 69 lines of output here
# possible issue with JSON formmatting here
# In particular, the line
# "result_count":1, "results":[
# since result count will always be 1, discard it
practitioner_rec = practitioner_rec.replace('"result_count":1, ', '')
print(practitioner_rec)
practitioner_data = json.loads(practitioner_rec)
VALID_NPI="TRUE"
VALID_MD="TRUE"
'''
none of these constructs works to print the provider name
print ['result_count']['results']['basic']['name'],"name"
print result_count['results']['basic']['name'],"name"
print practitioner_data['results']['basic']['name'],"name"
print results['basic']['name'],"name"
print ['basic']['name'],"name"
print basic['name'],"name"
print results[2]['basic']['name'],"name"
print results['basic']['name'],"name"
this works, but not useful if I can't pick values out
print(json.dumps(practitioner_data))
print "VALID_NPI is ",VALID_NPI
print "VALID_MD is ",VALID_MD
return [VALID_NPI,VALID_MD]
'''
if __name__ == '__main__':
main()
头晕过去了。我是个json新手,现在我是介绍人。下面是一个简短的代码段,以防其他人想要查询CMS NPI JSON数据并从中获取结果。不需要商业API。我看到引用的Bloom one似乎没有激活,其他人需要注册和跟踪数据才能访问公共数据 下面是访问单个字段的代码--
在SimpleTalk.com红门网站上有几种方法可以做到这一点:
import urllib
from bs4 import BeautifulSoup
import json
def main():
'''
NOTES:
1. pretty switch works set to either true OR on
2. a failed NPI search produces 3-line output like this --
{
"result_count":0, "results":[
]}
'''
# valid NPI
url = "https://npiregistry.cms.hhs.gov/api/resultsDemo2/?number=1881761864&pretty=on"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html,"lxml")
# remove HTML from output, producing just a JSON record
practitioner_rec = soup.text
# count lines generated by the query, valid queries are > than 3 lines long
linect = practitioner_rec.count('\n') +1
#print "there are ", linect," lines in the input file" # only for testing
if linect == 3:
VALID_NPI="FALSE"
VALID_MD="FALSE"
else:
'''
query produces a single result, with approx. 60+ lines of output
JSON data a little squirrelly, so we have to
'''
practitioner_rec = practitioner_rec.replace('"result_count":1, ', '')
# print(practitioner_rec) # only for testing
provider_dict = json.loads(practitioner_rec)
provider_info = provider_dict['results'][0]['basic']
print("name:", str(provider_info['name'])) # str-strip out unicode tag
VALID_NPI="TRUE"
VALID_MD="TRUE"
print "VALID_NPI is ",VALID_NPI
print "VALID_MD is ",VALID_MD
return [VALID_NPI,VALID_MD]
if __name__ == '__main__':
main()