Python 字典到数据帧？_Python_Python 3.x_Pandas_Dataframe

Python 字典到数据帧？

python python-3.x pandas dataframe

Python 字典到数据帧？,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我遵循韦斯·麦金尼的数据分析书，他创建了一个类似的代码结构，而不是使用列表的np数组 bills_dict = {'bills':np.random.randn(5,1)*5 + 50, 'tips':(np.random.randn(5,1)*5 + 50)*(np.random.uniform(0.1,0.3,(5,1))), 'dinner_time':np.reshape(np.random.choice(['Dinner','Lunc

我遵循韦斯·麦金尼的数据分析书，他创建了一个类似的代码结构，而不是使用列表的np数组

bills_dict = {'bills':np.random.randn(5,1)*5 + 50,
             'tips':(np.random.randn(5,1)*5 + 50)*(np.random.uniform(0.1,0.3,(5,1))),
             'dinner_time':np.reshape(np.random.choice(['Dinner','Lunch'],5),(5,1)),
             'smoker':np.reshape(np.random.choice(['Yes','No'],5),(5,1))}

这一阶段进展顺利，但当我尝试转换为数据帧时：

df_bills3 = pd.DataFrame(bills_dict)

见鬼去吧：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-111-83469cc92eef> in <module>
----> 1 df_bills3 = pd.DataFrame(bills_dict)

~\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    433             )
    434         elif isinstance(data, dict):
--> 435             mgr = init_dict(data, index, columns, dtype=dtype)
    436         elif isinstance(data, ma.MaskedArray):
    437             import numpy.ma.mrecords as mrecords

~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
    252             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    253         ]
--> 254     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    255 
    256 

~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
     62     # figure out the index, if necessary
     63     if index is None:
---> 64         index = extract_index(arrays)
     65     else:
     66         index = ensure_index(index)

~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
    353 
    354         if not indexes and not raw_lengths:
--> 355             raise ValueError("If using all scalar values, you must pass an index")
    356 
    357         if have_series:

ValueError: If using all scalar values, you must pass an index

它仍然在抱怨。怎么办？

主要问题是，您传递的是矩阵的关联列表，而不是数组。使每列成为数组，而不是矩阵

默认数据帧构造函数的替代方法

bills_dict={'bills'：（np.random.randn（5）*5+50），
"提示":(np.random.randn(5）*5+50)(np.random.uniform(0.1,0.3,5)),，
“晚餐时间”：np.random.choice（[“晚餐”，“午餐]，5），
“吸烟者”：np.random.choice（['Yes'，'No']，5）}
pd.DataFrame.from_dict（bills_dict，orient='columns'））

显然，问题在于每个字典键都是一个二维数组。您可以像这样将其展平：

bills_dict = {'bills':np.random.randn(5,1)*5 + 50,
             'tips':(np.random.randn(5,1)*5 + 50)*(np.random.uniform(0.1,0.3,(5,1))),
             'dinner_time':np.reshape(np.random.choice(['Dinner','Lunch'],5),(5,1)),
             'smoker':np.reshape(np.random.choice(['Yes','No'],5),(5,1))}

for k, v in bills_dict.items():
    bills_dict[k] = v.flatten()
    
pd.DataFrame(bills_dict)

或者，您可以更改创建词典的方式。这样就不需要将字典的值展平

bills_dict_2 = {
    'bills': np.random.randn(5)*5 + 50,
    'tips': (np.random.randn(5)*5 + 50) * (np.random.uniform(0.1, 0.3, 5)),
    'dinner_time': np.random.choice(['Dinner','Lunch'], 5),
    'smoker': np.random.choice(['Yes','No'], 5)
}

pd.DataFrame(bills_dict_2)

所有这些numpy数组都是shape（5，1），这就是问题所在。条例草案[条例草案].flatte（）.shape#（5，）

对于输入的票据_dict.keys（）：票据记录[key]=票据记录[key].flatte（）

你想用

注：DataFrame.from_dict（data，orient='columns'，dtype=None） orient的默认值为columns，如果columns输入，则无需指定

df.shape

只需

index=np.arange（5）

工作？打印账单，以确保可以使用它创建数据框

bills_dict_2 = {
    'bills': np.random.randn(5)*5 + 50,
    'tips': (np.random.randn(5)*5 + 50) * (np.random.uniform(0.1, 0.3, 5)),
    'dinner_time': np.random.choice(['Dinner','Lunch'], 5),
    'smoker': np.random.choice(['Yes','No'], 5)
}

pd.DataFrame(bills_dict_2)

import numpy as np
import pandas as pd
bills_dict = {'bills':np.random.randn(5,1)*5 + 50,
             'tips':(np.random.randn(5,1)*5 + 50)*(np.random.uniform(0.1,0.3,(5,1))),
             'dinner_time':np.reshape(np.random.choice(['Dinner','Lunch'],5),(5,1)),
             'smoker':np.reshape(np.random.choice(['Yes','No'],5),(5,1))}
bills_dict['bills']
type(bills_dict['bills'])
#
bills_dict['bills'].shape 
bills_dict['tips'].shape
bills_dict['dinner_time'].shape
bills_dict['smoker'].shape