Python 如何连接/连接这三个数据帧_Python_Python 3.x_Pandas_Dataframe

Python 如何连接/连接这三个数据帧

python python-3.x pandas dataframe

Python 如何连接/连接这三个数据帧,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有三个数据框df_男，df_女，df_变性示例数据帧 df_Male continent avg_count_country avg_age Asia 55 5 Africa 65 10 Europe 75 8 df_Female continent avg_count_country avg_age Asia

我有三个数据框df_男，df_女，df_变性

示例数据帧

df_Male

continent   avg_count_country   avg_age
  Asia          55                5
  Africa        65                10
  Europe        75                8

df_Female

continent   avg_count_country   avg_age
  Asia          50                7
  Africa        60                12
  Europe        70                0

df_Transgender

continent   avg_count_country   avg_age
  Asia          30                6
  Africa        40                11
  America       80                10

现在我像下面这样连接

frames = [df_Male, df_Female, df_Transgender]
df = pd.concat(frames, keys=['Male', 'Female', 'Transgender'])

正如你所看到的，

美国

出现在

df_变性人

中，同样明智的欧洲也出现在

df_男性

和

df_女性

因此，我必须以某种方式对其进行浓缩，使其看起来像下图，但不是手动的，因为可能有大量的行

              continent  avg_count_country  avg_age
Male        0      Asia                 55        5
            1    Africa                 65       10
            2    Europe                 75        8
            3    America                 0        0
Female      0      Asia                 50        7
            1    Africa                 60       12
            2    Europe                 70        0
            3    America                 0        0
Transgender 0      Asia                 30        6
            1    Africa                 40       11
            2    America                80       10
            3    Europe                 0         0

因此，对于其他

大陆

值

avg\u count\u country

和

avg\u age

应为0

可以在连接之前添加一个“性别”列

我们使用with

groupby

来计算笛卡尔积。这也会带来性能方面的好处

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

for col in ['gender', 'continent']:
    df[col] = df[col].astype('category')

res = df.groupby(['gender', 'continent']).first().fillna(0).astype(int)

print(res)

                       avg_count_country  avg_age
gender      continent                            
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

可以在连接之前添加“性别”列

我们使用with

groupby

来计算笛卡尔积。这也会带来性能方面的好处

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

for col in ['gender', 'continent']:
    df[col] = df[col].astype('category')

res = df.groupby(['gender', 'continent']).first().fillna(0).astype(int)

print(res)

                       avg_count_country  avg_age
gender      continent                            
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

你可以重新编制一点索引

from itertools import product

# Get rid of that number in the index, not sure why you'd need it
df.index = df.index.droplevel(-1)
# Add continents to the index
df = df.set_index('continent', append=True)

# Determine product of indices
ids = list(product(df.index.get_level_values(0).unique(), df.index.get_level_values(1).unique()))

# Reindex and fill missing with 0
df = df.reindex(ids).fillna(0).reset_index(level=-1)

df

现在是：

            continent  avg_count_country  avg_age
Male             Asia               55.0      5.0
Male           Africa               65.0     10.0
Male           Europe               75.0      8.0
Male          America                0.0      0.0
Female           Asia               50.0      7.0
Female         Africa               60.0     12.0
Female         Europe               70.0      0.0
Female        America                0.0      0.0
Transgender      Asia               30.0      6.0
Transgender    Africa               40.0     11.0
Transgender    Europe                0.0      0.0
Transgender   America               80.0     10.0

如果需要其他数值索引，则可以执行以下操作：

df.groupby（df.index）.cumcount（）

对每组中的值进行编号

您可以重新编制一点索引

from itertools import product

# Get rid of that number in the index, not sure why you'd need it
df.index = df.index.droplevel(-1)
# Add continents to the index
df = df.set_index('continent', append=True)

# Determine product of indices
ids = list(product(df.index.get_level_values(0).unique(), df.index.get_level_values(1).unique()))

# Reindex and fill missing with 0
df = df.reindex(ids).fillna(0).reset_index(level=-1)

df

现在是：

            continent  avg_count_country  avg_age
Male             Asia               55.0      5.0
Male           Africa               65.0     10.0
Male           Europe               75.0      8.0
Male          America                0.0      0.0
Female           Asia               50.0      7.0
Female         Africa               60.0     12.0
Female         Europe               70.0      0.0
Female        America                0.0      0.0
Transgender      Asia               30.0      6.0
Transgender    Africa               40.0     11.0
Transgender    Europe                0.0      0.0
Transgender   America               80.0     10.0

如果需要其他数值索引，则可以执行以下操作：

df.groupby（df.index）.cumcount（）

对每组中的值进行编号

对@jpp的答案稍加修改，就可以避免手动操作索引：

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

df.pivot('gender', 'continent').fillna(0).stack().astype(int)

                       avg_count_country  avg_age
gender      continent
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

对@jpp的答案稍加修改，即可避免手动操作索引：

df = pd.concat([df_Male.assign(gender='Male'),
                df_Female.assign(gender='Female'),
                df_Transgender.assign(gender='Transgender')])

df.pivot('gender', 'continent').fillna(0).stack().astype(int)

                       avg_count_country  avg_age
gender      continent
Female      Africa                    60       12
            America                    0        0
            Asia                      50        7
            Europe                    70        0
Male        Africa                    65       10
            America                    0        0
            Asia                      55        5
            Europe                    75        8
Transgender Africa                    40       11
            America                   80       10
            Asia                      30        6
            Europe                     0        0

如何在此数据帧上创建堆叠条形图。X轴男性、女性、变性人和Y轴的总数已经达到。如何在此数据帧上创建堆叠条形图。X轴男性、女性、变性人和Y轴的总数已经达到。