Python 尝试在matplotlib中打印或删除数据时出现KeyError
从导入的csv文件生成基本分布直方图时遇到问题。该代码适用于来自另一个csv的一组数据,但不是我感兴趣的,本质上是相同的。以下是我尝试过的代码:Python 尝试在matplotlib中打印或删除数据时出现KeyError,python,csv,pandas,matplotlib,histogram,Python,Csv,Pandas,Matplotlib,Histogram,从导入的csv文件生成基本分布直方图时遇到问题。该代码适用于来自另一个csv的一组数据,但不是我感兴趣的,本质上是相同的。以下是我尝试过的代码: import pandas as pd import numpy as np import matplotlib as plt data = pd.read_csv("idcases.csv") data1 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Marin")
import pandas as pd
import numpy as np
import matplotlib as plt
data = pd.read_csv("idcases.csv")
data1 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Marin")]
data2 = data[(data["Disease"] == "Amebiasis") & (data["County"] == "Sonoma")]
fig = plt.pyplot.figure()
ax = fig.add_subplot(111)
ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.pyplot.xlabel('Population')
plt.pyplot.ylabel('Count of Population')
plt.pyplot.show()
这将产生:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-35-63303aa9d8a5> in <module>()
1 fig = plt.pyplot.figure()
2 ax = fig.add_subplot(111)
----> 3 ax.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
4 plt.pyplot.xlabel('Count')
5 plt.pyplot.ylabel('Count of Population')
C:\Program Files (x86)\Anaconda\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
5602 # Massage 'x' for processing.
5603 # NOTE: Be sure any changes here is also done below to 'weights'
-> 5604 if isinstance(x, np.ndarray) or not iterable(x[0]):
5605 # TODO: support masked arrays;
5606 x = np.asarray(x)
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
549 def __getitem__(self, key):
550 try:
--> 551 result = self.index.get_value(self, key)
552
553 if not np.isscalar(result):
C:\Program Files (x86)\Anaconda\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1721
1722 try:
-> 1723 return self._engine.get_value(s, k)
1724 except KeyError as e1:
1725 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()
KeyError: 0L
对我来说:
import io
import matplotlib.pyplot as plt
s = """ Disease County Year Sex Count Population Rate CI.lower
Amebiasis Marin 2001 Total 14 247731 5.651 3.090
Amebiasis Marin 2001 Female 0 125414 0.000 0.000
Amebiasis Marin 2001 Male 0 122317 0.000 0.000
Amebiasis Marin 2002 Total 7 247382 2.830 1.138
Amebiasis Marin 2002 Female 0 125308 0.000 0.000
Amebiasis Marin 2002 Male 0 122074 0.000 0.000
Amebiasis Marin 2003 Total 9 247280 3.640 1.664
Amebiasis Marin 2003 Female 0 125259 0.000 0.000
Amebiasis Marin 2003 Male 0 122021 0.000 0.000 """
fobj = io.StringIO(s)
data1 = pd.read_csv(fobj, delim_whitespace=True)
plt.hist(data1['Population'], bins =10, range = (data1['Population'].min(), data1['Population'].max()))
plt.xlabel('Population')
plt.ylabel('Count of Population')
plt.show()
在从
matploblib-v1.4.3
升级到matplotlib-v1.5.0
时,我注意到pandas.Series
的绘图停止工作,例如:
ax.plot_date(df['date'], df['raw'], '.-', label='raw')
将导致键错误:0
异常
快速解决方案:
您需要将numpy.ndarray
而不是pandas.Series
传递到plot\u date
函数:
ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')
更多详情: 让我们看一下异常的完整回溯:
# ... PREVIOUS TRACEBACK MESSAGES OMITTED FOR BREVITY ...
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\matplotlib\dates.py in default_units(x, axis)
1562
1563 try:
-> 1564 x = x[0]
1565 except (TypeError, IndexError):
1566 pass
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
555 def __getitem__(self, key):
556 try:
--> 557 result = self.index.get_value(self, key)
558
559 if not np.isscalar(result):
C:\Users\pedromdu\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\index.py in get_value(self, series, key)
1788
1789 try:
-> 1790 return self._engine.get_value(s, k)
1791 except KeyError as e1:
1792 if len(self) > 0 and self.inferred_type in ['integer','boolean']:
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:3204)()
pandas\index.pyx in pandas.index.IndexEngine.get_value (pandas\index.c:2903)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)()
pandas\hashtable.pyx in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)()
KeyError: 0
请注意,当matploblib尝试执行x=x[0]
时,会产生错误。如果熊猫系列未使用从零开始的整数编制索引,这将失败,因为这将查找索引值为0
的项目,而不是pandas.series
的0th
元素
要解决此问题,我们需要从熊猫系列中的数据中获得一个numpy.ndarray
,然后使用该数据进行绘图:
ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')
这就是熊猫的问题。请显示
data1
的内容。您粘贴的数据似乎是以制表符分隔的(或粘贴后格式化的)。确保所有csv文件都有相同的分隔符,并将其作为参数提供给read_csvfunction@MikeMüller,数据1的内容在最后。@hitzg,我在粘贴后进行了格式化,以便更好地查看。我怎样才能知道使用了哪个分离器?csv文件本质上不都是逗号吗?
ax.plot_date(df['date'].values, df['raw'].values, '.-', label='raw')