Python/Pandas-将3个数据集合并为一个列图

Python/Pandas-将3个数据集合并为一个列图,python,pandas,Python,Pandas,我现在正在做基本的数据分析,当有3个数据集时,我正在努力创建一个柱状图 以下是我的数据: datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,

我现在正在做基本的数据分析,当有3个数据集时,我正在努力创建一个柱状图

以下是我的数据:

datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}

datasetColumbia = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['1500 ','1600', '1500' ,'1600' ,'1500', '1200' ,'1300', '1400' ,'1400', '1500' ,'1500' ,'1500' ,'1600' ,'1500', '1500', '1400', '1400']}

datasetBrazil = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}
关于将其转换为一个大的柱状图,并将各个国家用不同的颜色表示,有哪些建议

这是我将数据集组合在一起并打印出来的拙劣尝试

df4 = pd.DataFrame.from_dict(datasetArgentina)
df5 = pd.DataFrame.from_dict(datasetColumbia)
df6 = pd.DataFrame.from_dict(datasetBrazil)

df7 = pd.merge(df4, df5, on='Year')
df8 = pd.merge(df6, df7, on='Year', how='left')
print(df7)
print(df8)

plt.bar(df8['Year'], df8['Mortality'])
plt.title('South America')
plt.xticks(df8['Year'], rotation=90)
plt.xlabel('Year')
plt.ylabel('Mortality')
plt.tight_layout()
plt.show()
任何帮助都会很好

输出:

df7   Mortality_x  Year Mortality_y
0        11000  2000       1500 
1        10000  2001        1600
2        10000  2002        1500
3        10000  2003        1600
4        10000  2004        1500
5         9300  2005        1200
6         8900  2006        1300
7         8700  2007        1400
8         9000  2008        1400
9         8600  2009        1500
10        8300  2010        1500
11        8100  2011        1500
12        7800  2012        1600
13        8000  2013        1500
14        7500  2014        1500
15        7500  2015        1400
16        7300  2016        1400
df8   Mortality  Year Mortality_x Mortality_y
0      11000  2000       11000       1500 
1      10000  2001       10000        1600
2      10000  2002       10000        1500
3      10000  2003       10000        1600
4      10000  2004       10000        1500
5       9300  2005        9300        1200
6       8900  2006        8900        1300
7       8700  2007        8700        1400
8       9000  2008        9000        1400
9       8600  2009        8600        1500
10      8300  2010        8300        1500
11      8100  2011        8100        1500
12      7800  2012        7800        1600
13      8000  2013        8000        1500
14      7500  2014        7500        1500
15      7500  2015        7500        1400
16      7300  2016        7300        1400

使用
concat
连接数据帧,然后使用
groupby
+
plot
按国家对数据帧进行分组和绘图:

df = pd.concat(
       [df4, df5, df6], keys=['Argentina', 'Columbia', 'Brazil']
)

df.astype(int).groupby(level=0).plot.bar(x='Year', y='Mortality');
plt.show()

这将为每个组提供单独的绘图

您可以将
seaborn
factorplot
一起使用,如下所示:

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}

datasetColumbia = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['1500 ','1600', '1500' ,'1600' ,'1500', '1200' ,'1300', '1400' ,'1400', '1500' ,'1500' ,'1500' ,'1600' ,'1500', '1500', '1400', '1400']}

datasetBrazil = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}


df4 = pd.DataFrame(datasetArgentina)
df5 = pd.DataFrame(datasetColumbia)
df6 = pd.DataFrame(datasetBrazil)
附加代码:

# add country field for each dataframe
df4['country'] = 'Argentina'    
df5['country'] = 'Columbia'
df6['country'] = 'Brazil'

# Combine all dataframes
df = pd.concat([df4,df5,df6])
# convert to float
df['Mortality'] = df['Mortality'].astype(float)

sns.factorplot(data=df, hue='country', x='Year', y='Mortality', kind='bar', ci=None, aspect=3, size=7);
plt.xticks(rotation=45);
结果(有关更多信息,请查看
seaborn
factorplot
):


什么是df4、df5、df6和df7?dfs 1到3发生了什么事?嗨,Coldspeed,我删除了一些代码,因为df1到df3用于从CSV文件中获取数据,我想简明扼要地说明我需要什么帮助。好的,你能解释一下给定数据的预期输出吗?好的,我编辑了这篇文章以显示df7和df8的输出。如果有一种简单的方法可以将它们组合起来,我就考虑取消代码,因为输出会显示“死亡率,死亡率x,死亡率y”,而我想按国家标记这些。当你绘制柱形图时,你是指按国家堆叠的条形图吗?嗨,伙计,你所做的绝对令人惊讶。我肯定会看看seaborn和FactoryPlot,因为我的初始代码只有pandas/matplotlib/numpy。再次感谢!太好了<编码>快乐编码。