Python DataFrame中的Dictionary：一列中的键，多行和多列中的值，不包括NaN_Python_Python 2.7_Pandas_Dictionary

Python DataFrame中的Dictionary：一列中的键，多行和多列中的值，不包括NaN

python python-2.7 pandas dictionary

Python DataFrame中的Dictionary：一列中的键，多行和多列中的值，不包括NaN,python,python-2.7,pandas,dictionary,Python,Python 2.7,Pandas,Dictionary,我想从pd.DataFrame创建字典，其中id是键，所有value\ux都是值，但不包括NaN 数据帧newdf： id name value_1 value_2 value_3 0 ant jay 10.2 3.5 4.7 1 ant ann 5.7 10.2 NaN 2 bee will 7.4 NaN NaN 3 bee dave 12.4 1.3

我想从

pd.DataFrame

创建字典，其中

id

是键，所有

value\ux

都是值，但不包括

NaN

数据帧

newdf

：

     id    name  value_1  value_2  value_3
0    ant   jay   10.2     3.5      4.7
1    ant   ann   5.7      10.2     NaN
2    bee   will  7.4      NaN      NaN
3    bee   dave  12.4     1.3      6.9
4    bee   ed    0.8      NaN      NaN
5    cat   kit   NaN      NaN      5.2

预期结果（值按行排序）为

我正在尝试使用

.to_dict（）

，但它仍然有效

newdf.groupby('id').apply(newdf.iloc[:,-3:].to_dict())

或

使用：

详情：

print (df.set_index('id').iloc[:, -3:].stack())

id          
ant  value_1    10.2
     value_2     3.5
     value_3     4.7
     value_1     5.7
     value_2    10.2
bee  value_1     7.4
     value_1    12.4
     value_2     1.3
     value_3     6.9
     value_1     0.8
cat  value_3     5.2
dtype: float64

如果需要订购并且可以使用

0.21.0

生成

OrderedDict

：

from collections import OrderedDict

d = (df.set_index('id')
       .iloc[:, -3:]
       .stack()
       .groupby(level=0)
       .apply(tuple)
       .to_dict(into=OrderedDict))
print (d)

OrderedDict([('ant', (10.2, 3.5, 4.7, 5.7, 10.2)), 
             ('bee', (7.4, 12.4, 1.3, 6.9, 0.8)), 
             ('cat', (5.2,))])

是否有内存问题阻止您在没有您不想要的值的情况下创建DF？@roganjosh不太确定我是否理解您的问题，但我想说的是，我得到的所有错误都与内存无关。我建议您创建一个DF，其中包含您不想要删除的行。如果您也想保留原始数据，那么可能需要内存中有2个数据帧。请确保对数据的每个答案进行测试，以确定哪种方法最有效。只是说，堆栈对大数据的处理速度很慢。让我问一下，

groupby（level=0）

是否意味着按最后一个索引（id）的定义进行分组？我通过

stack

删除了

NaN

s，因此需要从

id

创建索引，然后选择要重塑的列，

例如，

通过

iloc

我刚刚尝试了

orderedict

并得到了

TypeError:to_dict（）得到了一个意外的关键字参数'into'

。这是因为Pandas版本吗？是的，这是上次Pandas版本中的新功能

0.21.0

-检查第二点

d = df.set_index('id').iloc[:, -3:].stack().groupby(level=0).apply(tuple).to_dict()
print (d)
{'bee': (7.4, 12.4, 1.3, 6.9, 0.8), 'cat': (5.2,), 'ant': (10.2, 3.5, 4.7, 5.7, 10.2)}

print (df.set_index('id').iloc[:, -3:].stack())

id          
ant  value_1    10.2
     value_2     3.5
     value_3     4.7
     value_1     5.7
     value_2    10.2
bee  value_1     7.4
     value_1    12.4
     value_2     1.3
     value_3     6.9
     value_1     0.8
cat  value_3     5.2
dtype: float64

from collections import OrderedDict

d = (df.set_index('id')
       .iloc[:, -3:]
       .stack()
       .groupby(level=0)
       .apply(tuple)
       .to_dict(into=OrderedDict))
print (d)

OrderedDict([('ant', (10.2, 3.5, 4.7, 5.7, 10.2)), 
             ('bee', (7.4, 12.4, 1.3, 6.9, 0.8)), 
             ('cat', (5.2,))])