Python 如何将数据帧转换为具有聚合级别的嵌套命名元组

Python 如何将数据帧转换为具有聚合级别的嵌套命名元组,python,pandas,tuples,Python,Pandas,Tuples,我正在寻找一种从数据帧创建嵌套命名元组的方法。 对象d是预期的输出。我不确定是否必须直接在Pandas中进行聚合,然后再进行到NamedTuple的转换 from typing import NamedTuple from typing import List import pandas as pd if __name__ == "__main__": data = [["tom", 10, "ab 11"], ["nick", 15, "ab 22"], ["juli", 14, "

我正在寻找一种从数据帧创建嵌套命名元组的方法。 对象
d
是预期的输出。我不确定是否必须直接在Pandas中进行聚合,然后再进行到
NamedTuple
的转换

from typing import NamedTuple
from typing import List
import pandas as pd

if __name__ == "__main__":
    data = [["tom", 10, "ab 11"], ["nick", 15, "ab 22"], ["juli", 14, "ab 11"]]
    People = pd.DataFrame(data, columns=["Name", "Age", "PostalCode"])

    names = list(People[["Name"]].itertuples(name="Names", index=False))
    postal_codes = list(
        People[["PostalCode"]].itertuples(name="PostalCode", index=False)
    )

    # ...
    # ... The code after produce the expected output even if the name of the NamedTuple doesn't matter

    PeopleName = NamedTuple("PeopleName", [("Name", str)])
    PeoplePC = NamedTuple("PeoplePC", [("PostalCode", str)])
    Demography = NamedTuple(
        "Demography", [("names", List[PeopleName]), ("postalcodes", PeoplePC)]
    )

    d = [
        Demography(
            [PeopleName(Name="tom"), PeopleName(Name="juli")],
            PeoplePC(PostalCode="ab 11"),
        ),
        Demography([PeopleName(Name="nick")], PeoplePC(PostalCode="ab 22"),),
    ]
您可以在以下组上使用函数(
)并将其应用于嵌套的元组

from typing import NamedTuple, List

import pandas as pd

data = [["tom", 10, "ab 11"], ["nick", 15, "ab 22"], ["juli", 14, "ab 11"]]
people = pd.DataFrame(data, columns=["Name", "Age", "PostalCode"])

PeopleName = NamedTuple("PeopleName", [("Name", str)])
PeoplePC = NamedTuple("PeoplePC", [("PostalCode", str)])
Demography = NamedTuple("Demography", [("names", List[PeopleName]), ("postalcodes", PeoplePC)])


def to_nested_tuple(k, g):
    peoples = list(g['Name'].to_frame().itertuples(name='Person', index=False))
    return Demography(peoples, PeoplePC(k))


d = [to_nested_tuple(*item) for item in people.groupby('PostalCode')]

print(d)
输出

[Demography(names=[Person(Name='tom'), Person(Name='juli')], postalcodes=PeoplePC(PostalCode='ab 11')), Demography(names=[Person(Name='nick')], postalcodes=PeoplePC(PostalCode='ab 22'))]

这段代码假设只从数据帧中检索到一个属性,那么检索多个字段的选项是什么,比如
…g['firstname','lastname']]…
-如果我错了,请纠正我的错误,但这不会生成
系列
-谢谢如果需要多个字段,请将
拖放到\u frame()
调用。这有意义吗?