如何在Python中绘制一列被另一列着色的直方图?

如何在Python中绘制一列被另一列着色的直方图?,python,pandas,histogram,visualization,data-visualization,Python,Pandas,Histogram,Visualization,Data Visualization,我有一个数据集,其中包括3列,标题为Gender(或者M或者F),House(或者a或者B或者C),以及Indicator(或者0或者1)。我想画出房子A的柱状图,按性别上色。这是我执行此操作的代码: import pandas as pd df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator') A = df[df['House']=='A'] A = pd.DataFrame(A, columns=['I

我有一个数据集,其中包括3列,标题为
Gender
(或者
M
或者
F
),
House
(或者
a
或者
B
或者
C
),以及
Indicator
(或者0或者1)。我想画出房子
A
的柱状图,按性别上色。这是我执行此操作的代码:

import pandas as pd

df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator')

A = df[df['House']=='A']
A = pd.DataFrame(A, columns=['Indicator', 'Gender'])
这将正确导入各个性别的房屋A的值,如其内容所示:

print(A)
            Indicator    Gender
0                   1      Male
1                   1      Male
2                   1      Male
4                   1    Female
7                   1      Male
8                   1      Male
11                  1      Male
14                  1      Male
17                  1      Male
18                  1    Female
19                  1    Female
20                  1    Female
21                  1      Male
24                  1      Male
26                  1    Female
27                  1      Male
...               ...       ...
现在,当我想像在MATLAB中那样绘制按性别着色的直方图时,它给出了一个错误:

import matplotlib.pyplot as plt
plt.hist(A)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-130-81c3aef1748b> in <module>()
----> 1 plt.hist(A)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, hold, data, **kwargs)
   3130                       histtype=histtype, align=align, orientation=orientation,
   3131                       rwidth=rwidth, log=log, color=color, label=label,
-> 3132                       stacked=stacked, normed=normed, data=data, **kwargs)
   3133     finally:
   3134         ax._hold = washold

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
   1853                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1854                         RuntimeWarning, stacklevel=2)
-> 1855             return func(ax, *args, **kwargs)
   1856 
   1857         inner.__doc__ = _add_data_doc(inner.__doc__,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(***failed resolving arguments***)
   6512             for xi in x:
   6513                 if len(xi) > 0:
-> 6514                     xmin = min(xmin, xi.min())
   6515                     xmax = max(xmax, xi.max())
   6516             bin_range = (xmin, xmax)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py in _amin(a, axis, out, keepdims)
     27 
     28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29     return umr_minimum(a, axis, None, out, keepdims)
     30 
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: '<=' not supported between instances of 'int' and 'str'

那么,我如何制作一个堆叠的柱状图,或者一个按性别着色的并排柱状图呢?类似这样,除了在x=0和x=1时,每个指示器只有2个条形:

x = np.random.randn(1000, 2)

colors = ['red', 'green']
plt.hist(x, color=colors)
plt.legend(['Male', 'Female'])
plt.title('Male and Female indicator by gender')

我尝试通过将2列数据框复制到列表的2列中,然后尝试绘制直方图来模拟上述情况:

y=[]
y[0] = A[A['Gender'=='M']].tolist()
y[1] = A[A['Gender'=='F']].tolist()
plt.hist(y)
但这会产生以下错误:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3062             try:
-> 3063                 return self._engine.get_loc(key)
   3064             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-152-138cb74b6e00> in <module>()
      2 A= pd.DataFrame(A, columns=['Indicator', 'Gender'])
      3 y=[]
----> 4 y[0] = A[A['Gender'=='M']].tolist()
      5 y[1] = A[A['Gender'=='F']].tolist()
      6 plt.hist(y)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2683             return self._getitem_multilevel(key)
   2684         else:
-> 2685             return self._getitem_column(key)
   2686 
   2687     def _getitem_column(self, key):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2690         # get column
   2691         if self.columns.is_unique:
-> 2692             return self._get_item_cache(key)
   2693 
   2694         # duplicate columns & possible reduce dimensionality

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   2484         res = cache.get(item)
   2485         if res is None:
-> 2486             values = self._data.get(item)
   2487             res = self._box_item_values(item, values)
   2488             cache[item] = res

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3063                 return self._engine.get_loc(key)
   3064             except KeyError:
-> 3065                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3066 
   3067         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False
keyrerror回溯(最近一次调用)
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
3062尝试:
->3063自动返回发动机。获取位置(钥匙)
3064键错误除外:
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
KeyError:错误
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
在()
2a=pd.DataFrame(A,列=['Indicator','Gender'])
3 y=[]
---->4 y[0]=A[A['Gender'='M']].tolist()
5 y[1]=A[A['Gender'='F']]
6 plt.历史(y)
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py in\uuuuuu getitem\uuuuuu(self,key)
2683返回自我。\u获取项目\u多级(键)
2684其他:
->2685返回自我。\u获取项目\u列(键)
2686
2687 def_getitem_列(自身,键):
_getitem_列中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py(self,key)
2690#获取列
2691如果self.columns.u是唯一的:
->2692返回自我。获取项目缓存(密钥)
2693
2694#重复列和可能的降维
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\generic.py在\u get\u item\u缓存中(self,item)
2484 res=cache.get(项)
2485如果res为无:
->2486 values=self.\u data.get(项目)
2487 res=自身。方框\项目\值(项目,值)
2488缓存[项目]=res
get中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\internals.py(self、item、fastpath)
4113
4114如果不是isna(项目):
->4115 loc=自身项目。获取loc(项目)
4116其他:
4117索引器=np.arange(len(self.items))[isna(self.items)]
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
3063自动返回发动机。获取位置(钥匙)
3064键错误除外:
->3065返回self.\u引擎。获取位置(self.\u可能\u投射\u索引器(键))
3066
3067 indexer=self.get_indexer([key],method=method,tolerance=tolerance)
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
KeyError:错误

以下方法应该有效,但不要使用您的数据进行测试

genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)

您的代码在
A[A['Gender'='M']
上失败,因为它应该是
A[A['Gender']='M']
来获取男性元素,但是您还需要选择您想要的列。

以下应该可以工作,但不需要使用您的数据进行测试

genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)

您的代码在
A[A['Gender'=='M']]
上失败,因为它应该是
A[A['Gender']=='M']
来获取男性元素,但是您还需要选择您想要的列。

是的,它可以工作。尽管我不得不删除
labels=genders
,因为它给出了一个属性错误,并单独放置了一行
plt.legend(genders)
。但是我不明白它为什么工作,如何工作,为什么我发布的代码不工作。我已经更新了我的答案,这样标签属性是正确的,并且解释了为什么你的代码不工作是的,它工作。尽管我不得不删除
labels=genders
,因为它给出了一个属性错误,并单独放置了一行
plt.legend(genders)
。但是我不明白它为什么工作或者如何工作,以及为什么我发布的代码不工作。我已经更新了我的答案,以便标签属性是正确的,并且还解释了代码不工作的原因