Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从API调用创建数据帧_Python_Numpy_Pandas_Python Requests - Fatal编程技术网

Python 从API调用创建数据帧

Python 从API调用创建数据帧,python,numpy,pandas,python-requests,Python,Numpy,Pandas,Python Requests,我正在构建一个API来检索人口普查数据,但是格式化输出时遇到了问题。我的问题实际上是两个问题之一: 1) 如何改进API调用,使输出更漂亮(理想情况下是数据帧) 或 2) 如何操作当前获取的列表,使其位于数据帧中 以下是我到目前为止的情况: import requests import pandas as pd import numpy as np mytoken = "numbersandletters" # this is my API key, so unfortunately I c

我正在构建一个API来检索人口普查数据,但是格式化输出时遇到了问题。我的问题实际上是两个问题之一:

1) 如何改进API调用,使输出更漂亮(理想情况下是数据帧)

2) 如何操作当前获取的列表,使其位于数据帧中

以下是我到目前为止的情况:

import requests
import pandas as pd
import numpy as np

mytoken = "numbersandletters" 
# this is my API key, so unfortunately I can't provide it

def state_data(token, variables, year = 2010, state = "*", survey = "sf1"):
    state = [str(i) for i in state]
    # make sure the input for state (integers) are strings
  variables = ",".join(variables) # squish all the variables into one string
  year = str(year)
  combine = ["http://api.census.gov/data/", year, "/", survey, "?key=", mytoken, "&get=", variables, "&for=state:"] 
# make a list of all the components to construct a URL
  incomplete_url = "".join(combine) # the URL without the state tackd on to the end
  complete_url = map(lambda i: incomplete_url + i, state) # now the state is tacked on to the end; one URL per state or for "*"
  r = []
  r = map(lambda i: requests.get(i), complete_url) 
# make an API call to each complete_url
  data = map(lambda i: i.json(), r)
print r
print data 
print type(data)
df = pd.DataFrame(data)
print df
调用该函数的示例如下,输出如下

state_data(token = mytoken, state = [47, 48, 49, 50], variables = ["P0010001", "P0010001"])
导致:

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]


[[[u'P0010001', u'P0010001', u'state'], [u'6346105', u'6346105', u'47']], 
[[u'P0010001', u'P0010001', u'state'], [u'25145561', u'25145561', u'48']], 
[[u'P0010001', u'P0010001', u'state'], [u'2763885', u'2763885', u'49']], 
[[u'P0010001', u'P0010001', u'state'], [u'625741', u'625741', u'50']]]

<type 'list'>
                         0                         1
0  [P0010001, P0010001, state]    [6346105, 6346105, 47]
1  [P0010001, P0010001, state]  [25145561, 25145561, 48]
2  [P0010001, P0010001, state]    [2763885, 2763885, 49]
3  [P0010001, P0010001, state]      [625741, 625741, 50]
Fwiw,R中的类似代码如下。我正在将我用R编写的库翻译成Python:

state.data = function(token, state = "*", variables, year = 2010, survey = "sf1"){
  state = as.character(state)
  variables = paste(variables, collapse = ",")
  year = as.character(year)
  my.url = matrix(paste("http://api.census.gov/data/", year, "/", survey, "?key=", token,
                    "&get=",variables, "&for=state:", state, sep = ""), ncol = 1)

  process.url = apply(my.url, 1, function(x)   process.api.data(fromJSON(file=url(x))))
  rbind.dat = data.frame(rbindlist(process.url))
  rbind.dat = rbind.dat[, c(tail(seq_len(ncol(rbind.dat)), 1), seq_len(ncol(rbind.dat) - 1))] 
  rbind.dat
}

所以你有重复的字段,这是毫无意义的,你的结果将只显示一个重复的字段

但是,您只需将
dict
对象的
list/iterable
传递给
pd.DataFrame
构造函数,就可以得到以下结果:

vals = [[[...]]]  # the data you provided in your example
df = pd.DataFrame(dict(zip(*v)) for v in vals)

假设这是您的数据:

data = [["P0010001","PCO0020019","state"], ["4779736","1204","01"], ["710231","53","02"], ["6392017","799","04"], ["2915918","924","05"], ["37253956","6244","06"], ["5029196","955","08"], ["3574097","1266","09"], ["897934","266","10"], ["601723","170","11"], ["18801310","4372","12"], ["9687653","1629","13"], ["1360301","251","15"], ["1567582","320","16"], ["12830632","3713","17"]]
那么这就行了:

df = pd.DataFrame(data[1:], columns=data[0])

因此,您需要弄清楚如何将数据转换到该表单中。我所做的只是传递一个列表(
data[1://code>)和一个列表(
data[0]

,您能举个例子说明原始数据的样子吗(您从
complete\u url
中检索的内容?如果是json,也许您可以使用
pd.read\u json
?我添加了一个
print r
,输出。r是一个列表。是的,不同的问题,很简单:
df=pd.DataFrame(数据[1:],列=数据[0]))
好的,我会试试看。我在原始问题中添加了我的R代码,以说明我为什么要这样做。这可能有助于上下文化。嗯。我尝试过的所有示例都出现了这个错误:
ValueError:传递值的形状是(0,0),索引暗示(3,0)
我应该迭代原始数据集的长度吗?我编辑了答案。您必须检查数据,看看为什么不正确。
df = pd.DataFrame(data[1:], columns=data[0])