如何在Python中绘制一列被另一列着色的直方图?
我有一个数据集,其中包括3列,标题为如何在Python中绘制一列被另一列着色的直方图?,python,pandas,histogram,visualization,data-visualization,Python,Pandas,Histogram,Visualization,Data Visualization,我有一个数据集,其中包括3列,标题为Gender(或者M或者F),House(或者a或者B或者C),以及Indicator(或者0或者1)。我想画出房子A的柱状图,按性别上色。这是我执行此操作的代码: import pandas as pd df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator') A = df[df['House']=='A'] A = pd.DataFrame(A, columns=['I
Gender
(或者M
或者F
),House
(或者a
或者B
或者C
),以及Indicator
(或者0或者1)。我想画出房子A
的柱状图,按性别上色。这是我执行此操作的代码:
import pandas as pd
df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator')
A = df[df['House']=='A']
A = pd.DataFrame(A, columns=['Indicator', 'Gender'])
这将正确导入各个性别的房屋A的值,如其内容所示:
print(A)
Indicator Gender
0 1 Male
1 1 Male
2 1 Male
4 1 Female
7 1 Male
8 1 Male
11 1 Male
14 1 Male
17 1 Male
18 1 Female
19 1 Female
20 1 Female
21 1 Male
24 1 Male
26 1 Female
27 1 Male
... ... ...
现在,当我想像在MATLAB中那样绘制按性别着色的直方图时,它给出了一个错误:
import matplotlib.pyplot as plt
plt.hist(A)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-130-81c3aef1748b> in <module>()
----> 1 plt.hist(A)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, hold, data, **kwargs)
3130 histtype=histtype, align=align, orientation=orientation,
3131 rwidth=rwidth, log=log, color=color, label=label,
-> 3132 stacked=stacked, normed=normed, data=data, **kwargs)
3133 finally:
3134 ax._hold = washold
~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
1853 "the Matplotlib list!)" % (label_namer, func.__name__),
1854 RuntimeWarning, stacklevel=2)
-> 1855 return func(ax, *args, **kwargs)
1856
1857 inner.__doc__ = _add_data_doc(inner.__doc__,
~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(***failed resolving arguments***)
6512 for xi in x:
6513 if len(xi) > 0:
-> 6514 xmin = min(xmin, xi.min())
6515 xmax = max(xmax, xi.max())
6516 bin_range = (xmin, xmax)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py in _amin(a, axis, out, keepdims)
27
28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29 return umr_minimum(a, axis, None, out, keepdims)
30
31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
TypeError: '<=' not supported between instances of 'int' and 'str'
那么,我如何制作一个堆叠的柱状图,或者一个按性别着色的并排柱状图呢?类似这样,除了在x=0和x=1时,每个指示器只有2个条形:
x = np.random.randn(1000, 2)
colors = ['red', 'green']
plt.hist(x, color=colors)
plt.legend(['Male', 'Female'])
plt.title('Male and Female indicator by gender')
我尝试通过将2列数据框复制到列表的2列中,然后尝试绘制直方图来模拟上述情况:
y=[]
y[0] = A[A['Gender'=='M']].tolist()
y[1] = A[A['Gender'=='F']].tolist()
plt.hist(y)
但这会产生以下错误:
KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3062 try:
-> 3063 return self._engine.get_loc(key)
3064 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-152-138cb74b6e00> in <module>()
2 A= pd.DataFrame(A, columns=['Indicator', 'Gender'])
3 y=[]
----> 4 y[0] = A[A['Gender'=='M']].tolist()
5 y[1] = A[A['Gender'=='F']].tolist()
6 plt.hist(y)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2683 return self._getitem_multilevel(key)
2684 else:
-> 2685 return self._getitem_column(key)
2686
2687 def _getitem_column(self, key):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2690 # get column
2691 if self.columns.is_unique:
-> 2692 return self._get_item_cache(key)
2693
2694 # duplicate columns & possible reduce dimensionality
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
2484 res = cache.get(item)
2485 if res is None:
-> 2486 values = self._data.get(item)
2487 res = self._box_item_values(item, values)
2488 cache[item] = res
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
4113
4114 if not isna(item):
-> 4115 loc = self.items.get_loc(item)
4116 else:
4117 indexer = np.arange(len(self.items))[isna(self.items)]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3063 return self._engine.get_loc(key)
3064 except KeyError:
-> 3065 return self._engine.get_loc(self._maybe_cast_indexer(key))
3066
3067 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
keyrerror回溯(最近一次调用)
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
3062尝试:
->3063自动返回发动机。获取位置(钥匙)
3064键错误除外:
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
KeyError:错误
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
在()
2a=pd.DataFrame(A,列=['Indicator','Gender'])
3 y=[]
---->4 y[0]=A[A['Gender'='M']].tolist()
5 y[1]=A[A['Gender'='F']]
6 plt.历史(y)
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py in\uuuuuu getitem\uuuuuu(self,key)
2683返回自我。\u获取项目\u多级(键)
2684其他:
->2685返回自我。\u获取项目\u列(键)
2686
2687 def_getitem_列(自身,键):
_getitem_列中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py(self,key)
2690#获取列
2691如果self.columns.u是唯一的:
->2692返回自我。获取项目缓存(密钥)
2693
2694#重复列和可能的降维
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\generic.py在\u get\u item\u缓存中(self,item)
2484 res=cache.get(项)
2485如果res为无:
->2486 values=self.\u data.get(项目)
2487 res=自身。方框\项目\值(项目,值)
2488缓存[项目]=res
get中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\internals.py(self、item、fastpath)
4113
4114如果不是isna(项目):
->4115 loc=自身项目。获取loc(项目)
4116其他:
4117索引器=np.arange(len(self.items))[isna(self.items)]
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
3063自动返回发动机。获取位置(钥匙)
3064键错误除外:
->3065返回self.\u引擎。获取位置(self.\u可能\u投射\u索引器(键))
3066
3067 indexer=self.get_indexer([key],method=method,tolerance=tolerance)
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
KeyError:错误
以下方法应该有效,但不要使用您的数据进行测试
genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)
您的代码在
A[A['Gender'='M']
上失败,因为它应该是A[A['Gender']='M']
来获取男性元素,但是您还需要选择您想要的列。以下应该可以工作,但不需要使用您的数据进行测试
genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)
您的代码在
A[A['Gender'=='M']]
上失败,因为它应该是A[A['Gender']=='M']
来获取男性元素,但是您还需要选择您想要的列。是的,它可以工作。尽管我不得不删除labels=genders
,因为它给出了一个属性错误,并单独放置了一行plt.legend(genders)
。但是我不明白它为什么工作,如何工作,为什么我发布的代码不工作。我已经更新了我的答案,这样标签属性是正确的,并且解释了为什么你的代码不工作是的,它工作。尽管我不得不删除labels=genders
,因为它给出了一个属性错误,并单独放置了一行plt.legend(genders)
。但是我不明白它为什么工作或者如何工作,以及为什么我发布的代码不工作。我已经更新了我的答案,以便标签属性是正确的,并且还解释了代码不工作的原因