Python 如何连接/连接这三个数据帧
我有三个数据框df_男,df_女,df_变性 示例数据帧Python 如何连接/连接这三个数据帧,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有三个数据框df_男,df_女,df_变性 示例数据帧 df_Male continent avg_count_country avg_age Asia 55 5 Africa 65 10 Europe 75 8 df_Female continent avg_count_country avg_age Asia
df_Male
continent avg_count_country avg_age
Asia 55 5
Africa 65 10
Europe 75 8
df_Female
continent avg_count_country avg_age
Asia 50 7
Africa 60 12
Europe 70 0
df_Transgender
continent avg_count_country avg_age
Asia 30 6
Africa 40 11
America 80 10
现在我像下面这样连接
frames = [df_Male, df_Female, df_Transgender]
df = pd.concat(frames, keys=['Male', 'Female', 'Transgender'])
正如你所看到的,美国
出现在df_变性人
中,同样明智的欧洲也出现在df_男性
和df_女性
因此,我必须以某种方式对其进行浓缩,使其看起来像下图,但不是手动的,因为可能有大量的行
continent avg_count_country avg_age
Male 0 Asia 55 5
1 Africa 65 10
2 Europe 75 8
3 America 0 0
Female 0 Asia 50 7
1 Africa 60 12
2 Europe 70 0
3 America 0 0
Transgender 0 Asia 30 6
1 Africa 40 11
2 America 80 10
3 Europe 0 0
因此,对于其他大陆
值avg\u count\u country
和avg\u age
应为0可以在连接之前添加一个“性别”列
我们使用withgroupby
来计算笛卡尔积。这也会带来性能方面的好处
df = pd.concat([df_Male.assign(gender='Male'),
df_Female.assign(gender='Female'),
df_Transgender.assign(gender='Transgender')])
for col in ['gender', 'continent']:
df[col] = df[col].astype('category')
res = df.groupby(['gender', 'continent']).first().fillna(0).astype(int)
print(res)
avg_count_country avg_age
gender continent
Female Africa 60 12
America 0 0
Asia 50 7
Europe 70 0
Male Africa 65 10
America 0 0
Asia 55 5
Europe 75 8
Transgender Africa 40 11
America 80 10
Asia 30 6
Europe 0 0
可以在连接之前添加“性别”列
我们使用withgroupby
来计算笛卡尔积。这也会带来性能方面的好处
df = pd.concat([df_Male.assign(gender='Male'),
df_Female.assign(gender='Female'),
df_Transgender.assign(gender='Transgender')])
for col in ['gender', 'continent']:
df[col] = df[col].astype('category')
res = df.groupby(['gender', 'continent']).first().fillna(0).astype(int)
print(res)
avg_count_country avg_age
gender continent
Female Africa 60 12
America 0 0
Asia 50 7
Europe 70 0
Male Africa 65 10
America 0 0
Asia 55 5
Europe 75 8
Transgender Africa 40 11
America 80 10
Asia 30 6
Europe 0 0
你可以重新编制一点索引
from itertools import product
# Get rid of that number in the index, not sure why you'd need it
df.index = df.index.droplevel(-1)
# Add continents to the index
df = df.set_index('continent', append=True)
# Determine product of indices
ids = list(product(df.index.get_level_values(0).unique(), df.index.get_level_values(1).unique()))
# Reindex and fill missing with 0
df = df.reindex(ids).fillna(0).reset_index(level=-1)
df
现在是:
continent avg_count_country avg_age
Male Asia 55.0 5.0
Male Africa 65.0 10.0
Male Europe 75.0 8.0
Male America 0.0 0.0
Female Asia 50.0 7.0
Female Africa 60.0 12.0
Female Europe 70.0 0.0
Female America 0.0 0.0
Transgender Asia 30.0 6.0
Transgender Africa 40.0 11.0
Transgender Europe 0.0 0.0
Transgender America 80.0 10.0
如果需要其他数值索引,则可以执行以下操作:
df.groupby(df.index).cumcount()
对每组中的值进行编号 您可以重新编制一点索引
from itertools import product
# Get rid of that number in the index, not sure why you'd need it
df.index = df.index.droplevel(-1)
# Add continents to the index
df = df.set_index('continent', append=True)
# Determine product of indices
ids = list(product(df.index.get_level_values(0).unique(), df.index.get_level_values(1).unique()))
# Reindex and fill missing with 0
df = df.reindex(ids).fillna(0).reset_index(level=-1)
df
现在是:
continent avg_count_country avg_age
Male Asia 55.0 5.0
Male Africa 65.0 10.0
Male Europe 75.0 8.0
Male America 0.0 0.0
Female Asia 50.0 7.0
Female Africa 60.0 12.0
Female Europe 70.0 0.0
Female America 0.0 0.0
Transgender Asia 30.0 6.0
Transgender Africa 40.0 11.0
Transgender Europe 0.0 0.0
Transgender America 80.0 10.0
如果需要其他数值索引,则可以执行以下操作:
df.groupby(df.index).cumcount()
对每组中的值进行编号 对@jpp的答案稍加修改,就可以避免手动操作索引:
df = pd.concat([df_Male.assign(gender='Male'),
df_Female.assign(gender='Female'),
df_Transgender.assign(gender='Transgender')])
df.pivot('gender', 'continent').fillna(0).stack().astype(int)
avg_count_country avg_age
gender continent
Female Africa 60 12
America 0 0
Asia 50 7
Europe 70 0
Male Africa 65 10
America 0 0
Asia 55 5
Europe 75 8
Transgender Africa 40 11
America 80 10
Asia 30 6
Europe 0 0
对@jpp的答案稍加修改,即可避免手动操作索引:
df = pd.concat([df_Male.assign(gender='Male'),
df_Female.assign(gender='Female'),
df_Transgender.assign(gender='Transgender')])
df.pivot('gender', 'continent').fillna(0).stack().astype(int)
avg_count_country avg_age
gender continent
Female Africa 60 12
America 0 0
Asia 50 7
Europe 70 0
Male Africa 65 10
America 0 0
Asia 55 5
Europe 75 8
Transgender Africa 40 11
America 80 10
Asia 30 6
Europe 0 0
如何在此数据帧上创建堆叠条形图。X轴男性、女性、变性人和Y轴的总数已经达到。如何在此数据帧上创建堆叠条形图。X轴男性、女性、变性人和Y轴的总数已经达到。