Python 通过使用多个列和多个索引进行分类，并转换为字典_Python_Pandas_Pandas Groupby

Python 通过使用多个列和多个索引进行分类，并转换为字典

python pandas

Python 通过使用多个列和多个索引进行分类，并转换为字典,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一个csv文件（我删除了逗号和所有字符）：我希望按团队、Col3和Col4进行分组，并将其转换为一个字典，其值为ID和Value。看起来是这样的 { (t1, c2, x1): [[3, 0.124],[1, 0.23]], (t2, c11, x10): [[6, 1.342], [5, 75.2], [9, 34.97]], (t3, c5, x2): [[7, 0.654], [4, 123.3]] } 我现在的代码是 df = pd.read_csv('file.c

我有一个csv文件（我删除了逗号和所有字符）：

我希望按团队、Col3和Col4进行分组，并将其转换为一个字典，其值为ID和Value。看起来是这样的

{
  (t1, c2, x1): [[3, 0.124],[1, 0.23]],
  (t2, c11, x10): [[6, 1.342], [5, 75.2], [9, 34.97]],
  (t3, c5, x2): [[7, 0.654], [4, 123.3]]
}

我现在的代码是

df = pd.read_csv('file.csv')
gk = dict(df.groupby(['team', 'Col3', 'Col4']).apply(list))

这段代码返回我想要的键（以元组形式），但每个键的值只是列名。。。像这样

(t1, c2, x1): ['ID', 'team', 'Col3', 'Col4', 'Value']

如何使字典的值仅为“ID”和“Value”列？

如果应将

ID

中的整数转换为浮点数，请使用自定义lambda函数将groupby之后选择的两列转换为numpy数组，然后转换为list，最后转换为dictionanry：

gk = (df.groupby(['Team', 'Col3', 'Col4'])[['ID','Value']]
        .apply(lambda x: x.to_numpy().tolist())
        .to_dict())
print (gk)
{('t1', 'c2', 'x1'): [[3.0, 0.124], [1.0, 0.23]], ('t2', 'c11', 'x10'): [[6.0, 1.3419999999999999], [5.0, 75.2], [9.0, 34.97]], ('t3', 'c5', 'x2'): [[7.0, 0.654], [4.0, 123.3]]}

或者在自定义函数中使用两列的

zip

，则类型不会更改：

gk = (df.groupby(['Team', 'Col3', 'Col4'])
        .apply(lambda x: list(zip(x['ID'], x['Value'])))
        .to_dict())
print (gk)
{('t1', 'c2', 'x1'): [(3, 0.124), (1, 0.23)], ('t2', 'c11', 'x10'): [(6, 1.3419999999999999), (5, 75.2), (9, 34.97)], ('t3', 'c5', 'x2'): [(7, 0.654), (4, 123.3)]}

尝试dict（df.groupby（['team'，'Col3'，'Col4']）['ID'，'Value'].apply（list））只是尝试了一下，没有显示所有的列名。现在它只显示“ID”和“Value”。像这样：

（t1，c2，x1）：['ID'，'Value']

这很有效！非常感谢。在接下来的7分钟内我不会接受这个答案，但我会接受这个答案。

gk = (df.groupby(['Team', 'Col3', 'Col4'])
        .apply(lambda x: list(zip(x['ID'], x['Value'])))
        .to_dict())
print (gk)
{('t1', 'c2', 'x1'): [(3, 0.124), (1, 0.23)], ('t2', 'c11', 'x10'): [(6, 1.3419999999999999), (5, 75.2), (9, 34.97)], ('t3', 'c5', 'x2'): [(7, 0.654), (4, 123.3)]}