Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将数据加载到熊猫中_Python_Pandas_Dataframe - Fatal编程技术网

Python 将数据加载到熊猫中

Python 将数据加载到熊猫中,python,pandas,dataframe,Python,Pandas,Dataframe,我试图从pypi中提取pip包的许可证信息,然后加载到pandas数据帧中。我之前做过一个示例,将列表理解加载到PD。但我无法理解这一点 到目前为止,我已经写了 from requests import get import pandas as pd import pip url = 'https://pypi.python.org/pypi' # packages_list = ['numpy','twisted'] installed_packages = pip.get_inst

我试图从pypi中提取pip包的许可证信息,然后加载到pandas数据帧中。我之前做过一个示例,将列表理解加载到PD。但我无法理解这一点

到目前为止,我已经写了

from requests import get

import pandas as pd

import pip

url = 'https://pypi.python.org/pypi'

# packages_list = ['numpy','twisted']

installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
     for i in installed_packages])

packages = []
licenses = []
summarys = []

for index, package in enumerate(installed_packages_list):
    package = package.split("==")[0]
    full_url = url+'/'+ package +'/json'
    #print 'url is ' + full_url
    page = get(url+'/'+package+'/json').json()


    #print 'Package: ' + package + ', license is:' + page['info']['license'] + '. ' + page['info']['summary']
    packages.append(package)
    licenses.append(page['info']['license'])
    summarys.append(page['info']['summary'])


print packages


pd_packages = pd.DataFrame(
    {
    "packages":[packages],
    "licenses":[licenses],
    "summarys":[summarys]
    })

print pd_packages

我认为这个问题源于数据框架(pd_包)的创建。软件包、许可证和摘要已经列在列表中,因此执行此操作将使其成为一个列表列表,用于解释下面评论中的输出

所以不是这个

pd_packages = pd.DataFrame(
    {
    "packages":[packages],
    "licenses":[licenses],
    "summarys":[summarys]
    })
试试这个

pd.DataFrame(
    {
    "packages":packages,
    "licenses":licenses,
    "summarys":summarys
    })
试试这个:

def get_pkg_info(pkg, url_pat='https://pypi.python.org/pypi/{}/json'):
    r = requests.get(url_pat.format(pkg))
    if r.status_code != requests.codes.ok:
         return [pkg, None, None]
    d = r.json()
    if d and 'info' in d:
        return [pkg, d['info'].get('license'), d['info'].get('summary')]
    else:
         return [pkg, None, None]

data = [get_pkg_info(x.split('==')[0]) for x in installed_packages_list]

df = pd.DataFrame(data, columns=['package','license','summary'])
演示:


问题是什么?它显示类似于0[MIT,MPL-2.0,LGPL,UNKNOWN,BSD-like,BSD,…packages\0[beautifulsoup4,bs4,certifi,chardet,get,i…Summary 0[屏幕抓取库,Be的虚拟软件包…我正在尝试以表格的形式获取这些数据,并使用Pandash将其转储到csv中如何在一个漂亮的表格视图中查看?或者是否有一种很好的方法重新写入上述内容?谢谢Bob。这是我在将[]添加到名称之前所做的操作…我收到一个错误“如果使用所有标量值,则必须传递一个索引”.这就是我添加[]的原因。这很奇怪。即使列表为空,我也不会料到会出现这种错误
In [166]: pd.options.display.max_rows = 15

In [167]: df = pd.DataFrame(data, columns=['package','license','summary'])

In [168]: df
Out[168]:
                package       license                                            summary
0             alabaster          None        A configurable sidebar-enabled Sphinx theme
1       anaconda-client       UNKNOWN         Anaconda Cloud command line client library
2    anaconda-navigator   Proprietary
3      anaconda-project          None                                               None
4            asn1crypto           MIT  Fast ASN.1 parser and serializer with definiti...
5               astroid          LGPL  A abstract syntax tree for Python with inferen...
6               astropy           BSD         Community-developed python astronomy tools
..                  ...           ...                                                ...
216              xarray        Apache          N-D labeled arrays and datasets in Python
217                xlrd           BSD  Library for developers to extract data from Mi...
218          xlsxwriter           BSD     A Python module for creating Excel XLSX files.
219             xlwings  BSD 3-clause  Make Excel fly: Interact with Excel from Pytho...
220                xlwt           BSD  Library to create spreadsheet files compatible...
221           xmltodict           MIT  Makes working with XML feel like you are worki...
222               yapsy           BSD                          Yet another plugin system

[223 rows x 3 columns]