Python 将数据帧转换为字典_Python_Python 3.x_Pandas_Dictionary_Dataframe

Python 将数据帧转换为字典

python python-3.x pandas dictionary dataframe

Python 将数据帧转换为字典,python,python-3.x,pandas,dictionary,dataframe,Python,Python 3.x,Pandas,Dictionary,Dataframe,我有一个熊猫数据框，如下所示： df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]}) df 看起来像 a b c d 0 red 0 0 1 1 yellow 0 1 0 2 blue 1 0 0 我想将其转换为字典，以便获得： red d yellow c blue b 如果数

我有一个熊猫数据框，如下所示：

df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})
df

看起来像

    a       b   c   d
0   red     0   0   1
1   yellow  0   1   0
2   blue    1   0   0

我想将其转换为字典，以便获得：

red     d
yellow  c
blue    b

如果数据集很大，请避免使用任何迭代方法。我还没有想出解决办法。非常感谢您的帮助。

希望这能奏效：

import pandas as pd
df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})

df['e'] = df.iloc[:,1:].idxmax(axis = 1).reset_index()['index']

newdf = df[["a","e"]]

print (newdf.to_dict(orient='index'))

输出：

{0: {'a': 'red', 'e': 'd'}, 1: {'a': 'yellow', 'e': 'c'}, 2: {'a': 'blue', 'e': 'b'}}

你可以试试这个

df = df.set_index('a')
df.where(df > 0).stack().reset_index().drop(0, axis=1)


    a   level_1
0   red     d
1   yellow  c
2   blue    b

首先，如果您真的想将其转换为字典，那么最好将您想要作为键的值转换为DataFrame的索引：

df.set_index('a', inplace=True)

这看起来像：

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

a
red       d
yellow    c
blue      b
dtype: object

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

您的数据似乎处于“一个热”编码中。首先，您必须使用以下方法将其反转：

这看起来像：

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

a
red       d
yellow    c
blue      b
dtype: object

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

快到了！现在，在“值”列上使用（此处设置列

，作为索引帮助）：

这看起来像：

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

a
red       d
yellow    c
blue      b
dtype: object

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

我想这就是你要找的。作为一个班轮：

df.set_index('a').idxmax(axis=1).to_dict()

您可以使用pandas with

list

作为参数，将

dataframe

转换为

dict

。然后迭代得到的

dict

并获取值为

的列标签

>>> {k:df.columns[1:][v.index(1)] for k,v in df.set_index('a').T.to_dict('list').items()}
>>> {'yellow': 'c', 'blue': 'b', 'red': 'd'}

您需要

dot

和

zip

dict(zip(df.a,df.iloc[:,1:].dot(df.iloc[:,1:].columns)))
Out[508]: {'blue': 'b', 'red': 'd', 'yellow': 'c'}

将列a设置为索引，然后查看df的行，找到值为1的索引，然后使用

to_dict

这是密码

df.set_index('a').apply(lambda row:row[row==1].index[0],axis=1).to_dict()

或者，将索引设置为a，然后使用argmax查找每行中最大值的索引，然后使用

将其转换为字典
df.set_index('a').apply(lambda row:row.argmax(),axis=1).to_dict()

在这两种情况下，结果都是
{'blue': 'b', 'red': 'd', 'yellow': 'c'}

Ps.我过去常常通过设置数据子集的可能重复项来迭代df的行，然后执行命令，该命令可通过熊猫
现货供应。一行中是否有两个1？@tai:一行中仅存在一个1是，我使用的是python 2.7，它的标签是3.x
。输出看起来不像OP想要的。似乎，我忘了使用axis列。我也检查了python3，工作正常。@bhushan，谢谢你的回答，但是输出不正确。。我想要一个不同的解释。我喜欢您为解决方案所做的简单步骤，但这是一个迭代步骤，对于我的大型数据集来说速度很慢。可能只是df.set_index（'a'）.dot（df.columns[1:]）。to_dict（）