如何在Python中绘制一列被另一列着色的直方图？_Python_Pandas_Histogram_Visualization_Data Visualization

如何在Python中绘制一列被另一列着色的直方图？

python pandas

如何在Python中绘制一列被另一列着色的直方图？,python,pandas,histogram,visualization,data-visualization,Python,Pandas,Histogram,Visualization,Data Visualization,我有一个数据集，其中包括3列，标题为Gender（或者M或者F），House（或者a或者B或者C），以及Indicator（或者0或者1）。我想画出房子A的柱状图，按性别上色。这是我执行此操作的代码： import pandas as pd df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator') A = df[df['House']=='A'] A = pd.DataFrame(A, columns=['I

我有一个数据集，其中包括3列，标题为

Gender

（或者

或者

），

House

（或者

或者

），以及

Indicator

（或者0或者1）。我想画出房子

的柱状图，按性别上色。这是我执行此操作的代码：

import pandas as pd

df = pd.read_csv('dataset.csv', usecols=['House','Gender','Indicator')

A = df[df['House']=='A']
A = pd.DataFrame(A, columns=['Indicator', 'Gender'])

这将正确导入各个性别的房屋A的值，如其内容所示：

print(A)
            Indicator    Gender
0                   1      Male
1                   1      Male
2                   1      Male
4                   1    Female
7                   1      Male
8                   1      Male
11                  1      Male
14                  1      Male
17                  1      Male
18                  1    Female
19                  1    Female
20                  1    Female
21                  1      Male
24                  1      Male
26                  1    Female
27                  1      Male
...               ...       ...

现在，当我想像在MATLAB中那样绘制按性别着色的直方图时，它给出了一个错误：

import matplotlib.pyplot as plt
plt.hist(A)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-130-81c3aef1748b> in <module>()
----> 1 plt.hist(A)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, hold, data, **kwargs)
   3130                       histtype=histtype, align=align, orientation=orientation,
   3131                       rwidth=rwidth, log=log, color=color, label=label,
-> 3132                       stacked=stacked, normed=normed, data=data, **kwargs)
   3133     finally:
   3134         ax._hold = washold

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\__init__.py in inner(ax, *args, **kwargs)
   1853                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1854                         RuntimeWarning, stacklevel=2)
-> 1855             return func(ax, *args, **kwargs)
   1856 
   1857         inner.__doc__ = _add_data_doc(inner.__doc__,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in hist(***failed resolving arguments***)
   6512             for xi in x:
   6513                 if len(xi) > 0:
-> 6514                     xmin = min(xmin, xi.min())
   6515                     xmax = max(xmax, xi.max())
   6516             bin_range = (xmin, xmax)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\core\_methods.py in _amin(a, axis, out, keepdims)
     27 
     28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29     return umr_minimum(a, axis, None, out, keepdims)
     30 
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: '<=' not supported between instances of 'int' and 'str'

那么，我如何制作一个堆叠的柱状图，或者一个按性别着色的并排柱状图呢？类似这样，除了在x=0和x=1时，每个指示器只有2个条形：

x = np.random.randn(1000, 2)

colors = ['red', 'green']
plt.hist(x, color=colors)
plt.legend(['Male', 'Female'])
plt.title('Male and Female indicator by gender')

我尝试通过将2列数据框复制到列表的2列中，然后尝试绘制直方图来模拟上述情况：

y=[]
y[0] = A[A['Gender'=='M']].tolist()
y[1] = A[A['Gender'=='F']].tolist()
plt.hist(y)

但这会产生以下错误：

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3062             try:
-> 3063                 return self._engine.get_loc(key)
   3064             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-152-138cb74b6e00> in <module>()
      2 A= pd.DataFrame(A, columns=['Indicator', 'Gender'])
      3 y=[]
----> 4 y[0] = A[A['Gender'=='M']].tolist()
      5 y[1] = A[A['Gender'=='F']].tolist()
      6 plt.hist(y)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2683             return self._getitem_multilevel(key)
   2684         else:
-> 2685             return self._getitem_column(key)
   2686 
   2687     def _getitem_column(self, key):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2690         # get column
   2691         if self.columns.is_unique:
-> 2692             return self._get_item_cache(key)
   2693 
   2694         # duplicate columns & possible reduce dimensionality

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   2484         res = cache.get(item)
   2485         if res is None:
-> 2486             values = self._data.get(item)
   2487             res = self._box_item_values(item, values)
   2488             cache[item] = res

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3063                 return self._engine.get_loc(key)
   3064             except KeyError:
-> 3065                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3066 
   3067         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

keyrerror回溯（最近一次调用）
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py（self、key、method、tolerance）
3062尝试：
->3063自动返回发动机。获取位置（钥匙）
3064键错误除外：
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc（）
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc（）
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item（）中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item（）中
KeyError:错误
在处理上述异常期间，发生了另一个异常：
KeyError回溯（最近一次呼叫最后一次）
在（）
2a=pd.DataFrame（A，列=['Indicator'，'Gender']）
3 y=[]
---->4 y[0]=A[A['Gender'='M']].tolist（）
5 y[1]=A[A['Gender'='F']]
6 plt.历史（y）
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py in\uuuuuu getitem\uuuuuu（self，key）
2683返回自我。\u获取项目\u多级（键）
2684其他：
->2685返回自我。\u获取项目\u列（键）
2686
2687 def_getitem_列（自身，键）：
_getitem_列中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\frame.py（self，key）
2690#获取列
2691如果self.columns.u是唯一的：
->2692返回自我。获取项目缓存（密钥）
2693
2694#重复列和可能的降维
~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\generic.py在\u get\u item\u缓存中（self，item）
2484 res=cache.get（项）
2485如果res为无：
->2486 values=self.\u data.get（项目）
2487 res=自身。方框\项目\值（项目，值）
2488缓存[项目]=res
get中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\internals.py（self、item、fastpath）
4113
4114如果不是isna（项目）：
->4115 loc=自身项目。获取loc（项目）
4116其他：
4117索引器=np.arange（len（self.items））[isna（self.items）]
get\u loc中的~\AppData\Local\Continuum\anaconda3\lib\site packages\pandas\core\index\base.py（self、key、method、tolerance）
3063自动返回发动机。获取位置（钥匙）
3064键错误除外：
->3065返回self.\u引擎。获取位置（self.\u可能\u投射\u索引器（键））
3066
3067 indexer=self.get_indexer（[key]，method=method，tolerance=tolerance）
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc（）
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc（）
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item（）中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item（）中
KeyError:错误

以下方法应该有效，但不要使用您的数据进行测试

genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)

您的代码在

A[A['Gender'='M']

上失败，因为它应该是

A[A['Gender']='M']

来获取男性元素，但是您还需要选择您想要的列。

以下应该可以工作，但不需要使用您的数据进行测试

genders = A.Gender.unique()
plt.hist([A.loc[A.Gender == x, 'Indicator'] for x in genders], label=genders)

您的代码在

A[A['Gender'=='M']]

上失败，因为它应该是

A[A['Gender']=='M']

来获取男性元素，但是您还需要选择您想要的列。

是的，它可以工作。尽管我不得不删除

labels=genders

，因为它给出了一个属性错误，并单独放置了一行

plt.legend（genders）

。但是我不明白它为什么工作，如何工作，为什么我发布的代码不工作。我已经更新了我的答案，这样标签属性是正确的，并且解释了为什么你的代码不工作是的，它工作。尽管我不得不删除

labels=genders

，因为它给出了一个属性错误，并单独放置了一行

plt.legend（genders）

。但是我不明白它为什么工作或者如何工作，以及为什么我发布的代码不工作。我已经更新了我的答案，以便标签属性是正确的，并且还解释了代码不工作的原因