Python/Pandas-将3个数据集合并为一个列图
我现在正在做基本的数据分析,当有3个数据集时,我正在努力创建一个柱状图 以下是我的数据:Python/Pandas-将3个数据集合并为一个列图,python,pandas,Python,Pandas,我现在正在做基本的数据分析,当有3个数据集时,我正在努力创建一个柱状图 以下是我的数据: datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,
datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}
datasetColumbia = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['1500 ','1600', '1500' ,'1600' ,'1500', '1200' ,'1300', '1400' ,'1400', '1500' ,'1500' ,'1500' ,'1600' ,'1500', '1500', '1400', '1400']}
datasetBrazil = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}
关于将其转换为一个大的柱状图,并将各个国家用不同的颜色表示,有哪些建议
这是我将数据集组合在一起并打印出来的拙劣尝试
df4 = pd.DataFrame.from_dict(datasetArgentina)
df5 = pd.DataFrame.from_dict(datasetColumbia)
df6 = pd.DataFrame.from_dict(datasetBrazil)
df7 = pd.merge(df4, df5, on='Year')
df8 = pd.merge(df6, df7, on='Year', how='left')
print(df7)
print(df8)
plt.bar(df8['Year'], df8['Mortality'])
plt.title('South America')
plt.xticks(df8['Year'], rotation=90)
plt.xlabel('Year')
plt.ylabel('Mortality')
plt.tight_layout()
plt.show()
任何帮助都会很好
输出:
df7 Mortality_x Year Mortality_y
0 11000 2000 1500
1 10000 2001 1600
2 10000 2002 1500
3 10000 2003 1600
4 10000 2004 1500
5 9300 2005 1200
6 8900 2006 1300
7 8700 2007 1400
8 9000 2008 1400
9 8600 2009 1500
10 8300 2010 1500
11 8100 2011 1500
12 7800 2012 1600
13 8000 2013 1500
14 7500 2014 1500
15 7500 2015 1400
16 7300 2016 1400
df8 Mortality Year Mortality_x Mortality_y
0 11000 2000 11000 1500
1 10000 2001 10000 1600
2 10000 2002 10000 1500
3 10000 2003 10000 1600
4 10000 2004 10000 1500
5 9300 2005 9300 1200
6 8900 2006 8900 1300
7 8700 2007 8700 1400
8 9000 2008 9000 1400
9 8600 2009 8600 1500
10 8300 2010 8300 1500
11 8100 2011 8100 1500
12 7800 2012 7800 1600
13 8000 2013 8000 1500
14 7500 2014 7500 1500
15 7500 2015 7500 1400
16 7300 2016 7300 1400
使用
concat
连接数据帧,然后使用groupby
+plot
按国家对数据帧进行分组和绘图:
df = pd.concat(
[df4, df5, df6], keys=['Argentina', 'Columbia', 'Brazil']
)
df.astype(int).groupby(level=0).plot.bar(x='Year', y='Mortality');
plt.show()
这将为每个组提供单独的绘图 您可以将
seaborn
与factorplot
一起使用,如下所示:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
datasetArgentina = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}
datasetColumbia = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['1500 ','1600', '1500' ,'1600' ,'1500', '1200' ,'1300', '1400' ,'1400', '1500' ,'1500' ,'1500' ,'1600' ,'1500', '1500', '1400', '1400']}
datasetBrazil = {'Year': ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015","2016"], 'Mortality': ['11000', '10000' ,'10000' ,'10000' ,'10000' ,'9300' ,'8900' ,'8700', '9000' , '8600' ,'8300' ,'8100','7800' ,'8000', '7500', '7500', '7300']}
df4 = pd.DataFrame(datasetArgentina)
df5 = pd.DataFrame(datasetColumbia)
df6 = pd.DataFrame(datasetBrazil)
附加代码:
# add country field for each dataframe
df4['country'] = 'Argentina'
df5['country'] = 'Columbia'
df6['country'] = 'Brazil'
# Combine all dataframes
df = pd.concat([df4,df5,df6])
# convert to float
df['Mortality'] = df['Mortality'].astype(float)
sns.factorplot(data=df, hue='country', x='Year', y='Mortality', kind='bar', ci=None, aspect=3, size=7);
plt.xticks(rotation=45);
结果(有关更多信息,请查看seaborn
和factorplot
):
什么是df4、df5、df6和df7?dfs 1到3发生了什么事?嗨,Coldspeed,我删除了一些代码,因为df1到df3用于从CSV文件中获取数据,我想简明扼要地说明我需要什么帮助。好的,你能解释一下给定数据的预期输出吗?好的,我编辑了这篇文章以显示df7和df8的输出。如果有一种简单的方法可以将它们组合起来,我就考虑取消代码,因为输出会显示“死亡率,死亡率x,死亡率y”,而我想按国家标记这些。当你绘制柱形图时,你是指按国家堆叠的条形图吗?嗨,伙计,你所做的绝对令人惊讶。我肯定会看看seaborn和FactoryPlot,因为我的初始代码只有pandas/matplotlib/numpy。再次感谢!太好了<编码>快乐编码。