Python 使用pandas合并和可视化数据集_Python_Pandas_Numpy_Matplotlib

Python 使用pandas合并和可视化数据集

python pandas numpy matplotlib

Python 使用pandas合并和可视化数据集,python,pandas,numpy,matplotlib,Python,Pandas,Numpy,Matplotlib,例如，我已清理并准备好合并以下数据帧 DataFrame1 for average Income per year Country | Year 1 | Year 2 | Year 3 A | 50 | 30 | 20 B | 70 | 20 | 90 C | 10 | 20 | 30 Dataframe2 for Fertility rate Country | Year 1

例如，我已清理并准备好合并以下数据帧

DataFrame1 for average Income per year

Country | Year 1  | Year 2  | Year 3
  A     |   50    |   30    |   20
  B     |   70    |   20    |   90
  C     |   10    |   20    |   30

Dataframe2 for Fertility rate 

Country | Year 1 | Year 2 | Year 3
   A    |   1.5  |   2    |  2.5
   B    |   2    |   2    |   3
   C    |   1    |   1    |   4

基本上，我试图在matplotlib上展示多年来DataFrame1和DataFrame2之间的关系。但我似乎无法合并它们，因为它们的标题与年份相同？此外，当我尝试使用X轴作为年份时，我似乎找不到一个图表来比较matplotlib上的这些数据。任何建议都很好，因为我使用上面的值，因为数据集非常大。是不是数据太多了

考虑生成具有次轴的独立国家图，因为您正在跟踪两个不同尺度的指标：收入和生育率。对于此设置，您需要使用

pandas.melt（）

将宽格式改为长格式。然后，遍历不同的国家以过滤数据帧

数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame({'Country': ['A', 'B', 'C'],
                    'Year 1': [50, 70, 10],
                    'Year 2': [30, 20, 20],
                    'Year 3': [20, 90, 30]})

df1 = df1.melt(id_vars='Country', value_name='Income', var_name='Year')

df2 = pd.DataFrame({'Country': ['A', 'B', 'C'],
                    'Year 1': [1.5, 2, 1],
                    'Year 2': [2.0, 2, 1],
                    'Year 3': [2.5, 3, 4]})

df2 = df2.melt(id_vars='Country', value_name='Fertility', var_name='Year')

绘图

for c in df1['Country'].unique():
    fig, ax1 = plt.subplots(figsize=(10,4))

    ax2 = ax1.twinx()
    df1[df1['Country']==c].plot(kind='line', x='Year', y='Income', ax=ax1, color='g', legend=False)
    df2[df2['Country']==c].plot(kind='line', x='Year', y='Fertility', ax=ax2, color='b', legend=False)

    plt.title('Country ' + c)
    ax1.set_xlabel('Years')
    ax1.set_ylabel('Average Income Per Year')
    ax2.set_ylabel('Fertility Rate')

    lines = ax1.get_lines() + ax2.get_lines()
    ax1.legend(lines, [l.get_label() for l in lines], loc='upper left')

    ax1.set_xticks(np.arange(3))
    ax1.set_xticklabels(df1["Year"].unique())

    plt.show()
    plt.clf()

plt.close()