Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/358.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用NA值填充dict以允许转换为数据帧_Python_Pandas_Dictionary_Dataframe_Na - Fatal编程技术网

Python 用NA值填充dict以允许转换为数据帧

Python 用NA值填充dict以允许转换为数据帧,python,pandas,dictionary,dataframe,na,Python,Pandas,Dictionary,Dataframe,Na,我有一个dict,它保存不同时间间隔的计算值,这意味着它们从不同的日期开始。例如,我拥有的数据可能如下所示: Date col1 col2 col3 col4 col5 01-01-15 5 12 1 -15 10 01-02-15 7 0 9 11 7 01-03-15 6 1 2 18 01-04-15

我有一个dict,它保存不同时间间隔的计算值,这意味着它们从不同的日期开始。例如,我拥有的数据可能如下所示:

Date      col1    col2    col3    col4    col5
01-01-15  5       12      1      -15      10
01-02-15  7       0       9       11      7
01-03-15          6       1       2       18
01-04-15          9       8       10
01-05-15         -4               7
01-06-15         -11             -1
01-07-15          6               
其中每个标题都是键,每列值都是每个键的值(我使用的是
defaultdict(list)
)。当我试图运行
pd.DataFrame.from_dict(d)
时,我可以理解地得到一个错误,指出所有数组的长度必须相同。是否有一种简单/简单的方法来填充或填充数字,以便输出最终成为以下数据帧

Date      col1    col2    col3    col4    col5
01-01-15  5       12      1      -15      10
01-02-15  7       0       9       11      7
01-03-15  NaN     6       1       2       18
01-04-15  NaN     9       8       10      NaN
01-05-15  NaN    -4       NaN     7       NaN
01-06-15  NaN    -11      NaN    -1       NaN
01-07-15  NaN     6       NaN     NaN     NaN
或者我必须对每个列表手动执行此操作

以下是重新创建字典的代码:

import pandas as pd
from collections import defaultdict

d = defaultdict(list)
d["Date"].extend([
    "01-01-15", 
    "01-02-15", 
    "01-03-15", 
    "01-04-15", 
    "01-05-15",
    "01-06-15",
    "01-07-15"
]
d["col1"].extend([5, 7])
d["col2"].extend([12, 0, 6, 9, -4, -11, 6])
d["col3"].extend([1, 9, 1, 8])
d["col4"].extend([-15, 11, 2, 10, 7, -1])
d["col5"].extend([10, 7, 18])
输出-

   a    b
0  1  1.0
1  2  2.0
2  3  3.0
3  4  NaN
4  5  NaN

另一种选择是使用
from_dict
orient='index'
进行转换:

my_dict = {'a' : [1, 2, 3, 4, 5], 'b': [1, 2, 3]}
df = pd.DataFrame.from_dict(my_dict, orient='index').T
请注意,如果您的列具有不同的类型,例如,一列中有浮点数,另一列中有字符串,则使用
dtype
可能会遇到问题

结果输出:

     a    b
0  1.0  1.0
1  2.0  2.0
2  3.0  3.0
3  4.0  NaN
4  5.0  NaN

这里有一种使用掩蔽的方法-

K = d.keys()
V = d.values()

mask = ~np.in1d(K,'Date')
K1 = [K[i] for i,item in enumerate(V) if mask[i]]
V1 = [V[i] for i,item in enumerate(V) if mask[i]]

lens = np.array([len(item) for item in V1])
mask = lens[:,None] > np.arange(lens.max())

out_arr = np.full(mask.shape,np.nan)
out_arr[mask] = np.concatenate(V1)
df = pd.DataFrame(out_arr.T,columns=K1,index=d['Date'])
样本运行-

In [612]: d.keys()
Out[612]: ['col4', 'col5', 'col2', 'col3', 'col1', 'Date']

In [613]: d.values()
Out[613]: 
[[-15, 11, 2, 10, 7, -1],
 [10, 7, 18],
 [12, 0, 6, 9, -4, -11, 6],
 [1, 9, 1, 8],
 [5, 7],
 ['01-01-15',
  '01-02-15',
  '01-03-15',
  '01-04-15',
  '01-05-15',
  '01-06-15',
  '01-07-15']]

In [614]: df
Out[614]: 
          col4  col5  col2  col3  col1
01-01-15   -15    10    12     1     5
01-02-15    11     7     0     9     7
01-03-15     2    18     6     1   NaN
01-04-15    10   NaN     9     8   NaN
01-05-15     7   NaN    -4   NaN   NaN
01-06-15    -1   NaN   -11   NaN   NaN
01-07-15   NaN   NaN     6   NaN   NaN
使用itertools(Python 3):


您能否添加可以重新创建示例dict的代码?另外,不适用,你是指南斯吗?如果你做一些腿部工作,并分享@Divakar所指的代码,你会很容易从我们中得到答案。刚刚添加。是的,我是说楠的。很抱歉,我在Excel中花费了太多时间。这里有一些很好的答案,但我认为这是最好的答案。作为后续,是否有一种简单的方法来预先编写
NaN
s,而不是将它们附加到末尾?@hashcode55是的,对于最初发布的示例,列表中的值更深一层。现在必须更新新发布的样本,谢谢!
In [612]: d.keys()
Out[612]: ['col4', 'col5', 'col2', 'col3', 'col1', 'Date']

In [613]: d.values()
Out[613]: 
[[-15, 11, 2, 10, 7, -1],
 [10, 7, 18],
 [12, 0, 6, 9, -4, -11, 6],
 [1, 9, 1, 8],
 [5, 7],
 ['01-01-15',
  '01-02-15',
  '01-03-15',
  '01-04-15',
  '01-05-15',
  '01-06-15',
  '01-07-15']]

In [614]: df
Out[614]: 
          col4  col5  col2  col3  col1
01-01-15   -15    10    12     1     5
01-02-15    11     7     0     9     7
01-03-15     2    18     6     1   NaN
01-04-15    10   NaN     9     8   NaN
01-05-15     7   NaN    -4   NaN   NaN
01-06-15    -1   NaN   -11   NaN   NaN
01-07-15   NaN   NaN     6   NaN   NaN
import itertools
pd.DataFrame(list(itertools.zip_longest(*d.values())), columns=d.keys()).sort_index(axis=1)
Out[728]: 
   col1  col2  col3  col4  col5
0   5.0    12   1.0 -15.0  10.0
1   7.0     0   9.0  11.0   7.0
2   NaN     6   1.0   2.0  18.0
3   NaN     9   8.0  10.0   NaN
4   NaN    -4   NaN   7.0   NaN
5   NaN   -11   NaN  -1.0   NaN
6   NaN     6   NaN   NaN   NaN