Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按行号填充数据帧_Python_Pandas_Dictionary - Fatal编程技术网

Python 按行号填充数据帧

Python 按行号填充数据帧,python,pandas,dictionary,Python,Pandas,Dictionary,问题 我有一本这样的字典: d = { 'a': [['a', 0], ['b', 1], ['a', 2]], 'b': [['d', 0], ['d', 1], ['d', 3]], 'c': [['f', 2], ['g', 3], ['h', 4]] } a b c 0 a d None 1 b d None 2 a None f 3 None d g 4 None None

问题

我有一本这样的字典:

d = {
'a': [['a', 0], ['b', 1], ['a', 2]],
'b': [['d', 0], ['d', 1], ['d', 3]],
'c': [['f', 2], ['g', 3], ['h', 4]]
}
      a     b     c
0     a     d  None
1     b     d  None
2     a  None     f
3  None     d     g
4  None  None     h
字典的形式为:
列:[[值,行号],…]

我想将其转换为如下所示的数据帧:

d = {
'a': [['a', 0], ['b', 1], ['a', 2]],
'b': [['d', 0], ['d', 1], ['d', 3]],
'c': [['f', 2], ['g', 3], ['h', 4]]
}
      a     b     c
0     a     d  None
1     b     d  None
2     a  None     f
3  None     d     g
4  None  None     h
尝试

我能想到的唯一方法是创建一个新的字典:

new = {
'a': ['a', 'b', 'a', None, None],
'b': ['d', 'd', None, 'd', None],
'c': [None, None, 'f', 'g', 'h']
}
然后可以使用该字典创建数据帧,如下所示:

df = pd.DataFrame(new)
d2 = {colname: {sublist[1]: sublist[0] for sublist in listoflists} \
      for colname, listoflists in d.items()}
df = pd.DataFrame(d2)
df.where(pd.notnull(df), None)  # replace np.nan with None
     a    b    c
0    a    d  None
1    b    d  None
2    a  None    f
3  None    d    g
4  None  None    h

我可以写一个循环来实现这一点,但这相当乏味,我想知道是否有更好的方法。请告知。

我不认为将此直接放入
pandas
将提供如此显著的加速效果,因为您将使用对象。因此,最好在
pandas
之外修改列表字典。最好的方法是将每个列表转换成字典,因为将
dict
dict
转换成
DataFrame
非常好:

In [ ]: new_d = {col_name:{row_num: value for value, row_num in col_data} for col_name, col_data in d.items()}
   ...: pd.DataFrame(new_d)
Out[ ]: 
     a    b    c
0    a    d  NaN
1    b    d  NaN
2    a  NaN    f
3  NaN    d    g
4  NaN  NaN    h

按如下方式重新构造词典:

df = pd.DataFrame(new)
d2 = {colname: {sublist[1]: sublist[0] for sublist in listoflists} \
      for colname, listoflists in d.items()}
df = pd.DataFrame(d2)
df.where(pd.notnull(df), None)  # replace np.nan with None
     a    b    c
0    a    d  None
1    b    d  None
2    a  None    f
3  None    d    g
4  None  None    h
您可以使用
pivot()
,在稍微重新排列之后:

data = [(key,ix,val) for key, pair in d.items() for val, ix in pair]
df = pd.DataFrame(data).pivot(index=1, columns=0, values=2)
输出:

0     a     b     c
1                  
0     a     d  None
1     b     d  None
2     a  None     f
3  None     d     g
4  None  None     h
注意:要删除列和索引值,请使用:

df.index.name = ""
df.columns.name = ""

改变你的口述是一条路,但这里有另一个解决方案

df=pd.DataFrame(d)
df1=pd.concat([df[x].apply(pd.Series).set_index(1) for x in df.columns],1)
df1.columns=df.columns
df1
Out[477]: 
     a    b    c
1               
0    a    d  NaN
1    b    d  NaN
2    a  NaN    f
3  NaN    d    g
4  NaN  NaN    h
如果你想改变你的口述

d1={k:{t[1]:t[0] for t in v} for k,v in d.items()}
d1
Out[479]: 
{'a': {0: 'a', 1: 'b', 2: 'a'},
 'b': {0: 'd', 1: 'd', 3: 'd'},
 'c': {2: 'f', 3: 'g', 4: 'h'}}