Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:将人们分组到家庭中,生成描述词_Python_Pandas - Fatal编程技术网

Python 熊猫:将人们分组到家庭中,生成描述词

Python 熊猫:将人们分组到家庭中,生成描述词,python,pandas,Python,Pandas,我的问题可以简化为有两个数据帧 数据框1包含人们及其居住的家庭: Person ID | Household ID 1 1 2 2 3 2 4 3 5 1 数据框2包含人的个人特征: Person ID | Age | Workstatus | Education 1 20 Working High 2 29 Worki

我的问题可以简化为有两个数据帧

数据框1包含人们及其居住的家庭:

Person ID | Household ID
1           1
2           2
3           2
4           3
5           1
数据框2包含人的个人特征:

Person ID | Age  |  Workstatus  | Education
1           20      Working      High
2           29      Working      Medium
3           31     Unemployed    Low
4           45     Unemployed    Medium
5           30      Working      Medium
目标是将属于同一家庭ID的人分组在一起,以便生成关于家庭的描述,例如“家庭中的人的平均年龄”、“平均教育水平”等

我试过:

df1.groupby['Household ID']
但我不确定从那里开始,如何用熊猫的方式来做。真实的数据集非常大,所以处理列表需要花费太长时间

理想的输出是:

Household ID | Avg Age of persons | Education
1               25                   High/med
2               25.7                 High/High
3               28                   Low/Low


我们可以使用
.map
获取家庭ID和
groupby
以及命名聚合

df3 = (
    df2.assign(houseID=df2["Person ID"].map(df1.set_index("Person ID")["Household ID"]))
    .groupby("houseID")
    .agg(avgAgeOfPerson=("Age", "mean"), Education=("Education", "/".join))
)


您可以合并这两个数据集,然后根据住户id进行分组:

df1 = pd.DataFrame([[1,1],[2,2],[3,2],[4,3],[5,1]],columns = ['Person ID', 'Household ID']) 

df2 = pd.DataFrame([[1,20,'Working', 'High'],[2,29,'Working','Medium'],[3,31,'Unemployed','Low'],[4,45,'Unemployed','Medium'],[5,30,'Working','Medium']],columns = ['Person ID','Age','Workstatus','Education']) 

merged = pd.merge(df1,df2, on = 'Person ID', how = 'left')

merged.groupby('Household ID').agg({'Age':'mean', 'Education':list}) 

Result:

              Age       Education
Household ID                     
1              25  [High, Medium]
2              30   [Medium, Low]
3              45        [Medium]

您的理想输出是什么?我在问题中添加了理想输出。这正是我想要的。谢谢,我将研究.map函数。将其视为一个
连接,如果您来自SQL,则它类似于
联合
df1 = pd.DataFrame([[1,1],[2,2],[3,2],[4,3],[5,1]],columns = ['Person ID', 'Household ID']) 

df2 = pd.DataFrame([[1,20,'Working', 'High'],[2,29,'Working','Medium'],[3,31,'Unemployed','Low'],[4,45,'Unemployed','Medium'],[5,30,'Working','Medium']],columns = ['Person ID','Age','Workstatus','Education']) 

merged = pd.merge(df1,df2, on = 'Person ID', how = 'left')

merged.groupby('Household ID').agg({'Age':'mean', 'Education':list}) 

Result:

              Age       Education
Household ID                     
1              25  [High, Medium]
2              30   [Medium, Low]
3              45        [Medium]