Python 熊猫:将人们分组到家庭中,生成描述词
我的问题可以简化为有两个数据帧 数据框1包含人们及其居住的家庭:Python 熊猫:将人们分组到家庭中,生成描述词,python,pandas,Python,Pandas,我的问题可以简化为有两个数据帧 数据框1包含人们及其居住的家庭: Person ID | Household ID 1 1 2 2 3 2 4 3 5 1 数据框2包含人的个人特征: Person ID | Age | Workstatus | Education 1 20 Working High 2 29 Worki
Person ID | Household ID
1 1
2 2
3 2
4 3
5 1
数据框2包含人的个人特征:
Person ID | Age | Workstatus | Education
1 20 Working High
2 29 Working Medium
3 31 Unemployed Low
4 45 Unemployed Medium
5 30 Working Medium
目标是将属于同一家庭ID的人分组在一起,以便生成关于家庭的描述,例如“家庭中的人的平均年龄”、“平均教育水平”等
我试过:
df1.groupby['Household ID']
但我不确定从那里开始,如何用熊猫的方式来做。真实的数据集非常大,所以处理列表需要花费太长时间
理想的输出是:
Household ID | Avg Age of persons | Education
1 25 High/med
2 25.7 High/High
3 28 Low/Low
我们可以使用
.map
获取家庭ID和groupby
以及命名聚合
df3 = (
df2.assign(houseID=df2["Person ID"].map(df1.set_index("Person ID")["Household ID"]))
.groupby("houseID")
.agg(avgAgeOfPerson=("Age", "mean"), Education=("Education", "/".join))
)
您可以合并这两个数据集,然后根据住户id进行分组:
df1 = pd.DataFrame([[1,1],[2,2],[3,2],[4,3],[5,1]],columns = ['Person ID', 'Household ID'])
df2 = pd.DataFrame([[1,20,'Working', 'High'],[2,29,'Working','Medium'],[3,31,'Unemployed','Low'],[4,45,'Unemployed','Medium'],[5,30,'Working','Medium']],columns = ['Person ID','Age','Workstatus','Education'])
merged = pd.merge(df1,df2, on = 'Person ID', how = 'left')
merged.groupby('Household ID').agg({'Age':'mean', 'Education':list})
Result:
Age Education
Household ID
1 25 [High, Medium]
2 30 [Medium, Low]
3 45 [Medium]
您的理想输出是什么?我在问题中添加了理想输出。这正是我想要的。谢谢,我将研究.map函数。将其视为一个
左
连接,如果您来自SQL,则它类似于联合
df1 = pd.DataFrame([[1,1],[2,2],[3,2],[4,3],[5,1]],columns = ['Person ID', 'Household ID'])
df2 = pd.DataFrame([[1,20,'Working', 'High'],[2,29,'Working','Medium'],[3,31,'Unemployed','Low'],[4,45,'Unemployed','Medium'],[5,30,'Working','Medium']],columns = ['Person ID','Age','Workstatus','Education'])
merged = pd.merge(df1,df2, on = 'Person ID', how = 'left')
merged.groupby('Household ID').agg({'Age':'mean', 'Education':list})
Result:
Age Education
Household ID
1 25 [High, Medium]
2 30 [Medium, Low]
3 45 [Medium]