“plotnine”中的KeyError（python的ggplot包装器）_Python_Keyerror_Python Ggplot_Plotnine

“plotnine”中的KeyError（python的ggplot包装器）

python

“plotnine”中的KeyError（python的ggplot包装器）,python,keyerror,python-ggplot,plotnine,Python,Keyerror,Python Ggplot,Plotnine,我正在尝试使用plotnine构建图形，当我只想绘制x轴时，我不断遇到相同的关键错误问题。请参阅下面的回溯错误。我的数据示例如下： WORD TAG TOPIC Value 0 hey aa 1 234 1 working bb 1 123 2 lullaby cc 2 32 3 Doggy cc 2 63 4 document aa 3 84 我的代码示例：

我正在尝试使用

plotnine

构建图形，当我只想绘制

x轴时，我不断遇到相同的关键错误问题。请参阅下面的回溯错误。
我的数据示例如下：
       WORD  TAG  TOPIC Value      
0       hey  aa      1  234 
1   working  bb      1  123 
2   lullaby  cc      2  32
3     Doggy  cc      2  63
4  document  aa      3  84

我的代码示例：
from plotnine import *
import pandas as pd

inFile = 'infile.csv'
df = pd.read_csv(inFile, names = ['WORD', 'TAG','TOPIC','VALUE'], header=0,sep='\t')
df.sort_values('value',ascending=False)
sortedDf = df[:5]

plot1 = ggplot(sortedDf) + aes(x='TOPIC') + geom_histogram(binwidth=3)

其中，最终目标是在直方图中绘制每个主题的计数。
我不确定是什么数据丢失导致了以下键
错误，因为不需要权重
，因为我只对绘制一个特定变量的计数感兴趣，即主题1=2，主题2=2，主题3=1
是否有人可以链接到plotline
的更详细文档，或者有任何使用该库的经验来帮助我更详细地了解我缺少的内容
Traceback Error:


    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    <ipython-input-112-71707b4cf21a> in <module>()
          1 plot2 = ggplot(sortedDf) + aes(x='TOPIC') + geom_histogram(binwidth=3)
    ----> 2 print plot2

    /Users/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in __repr__(self)
         82         Print/show the plot
         83         """
    ---> 84         self.draw()
         85         plt.show()
         86         return '<ggplot: (%d)>' % self.__hash__()

    /Users/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in draw(self)
        139         # assign a default theme
        140         self = deepcopy(self)
    --> 141         self._build()
        142 
        143         # If no theme we use the default

    /Users/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in _build(self)
        235 
        236         # Apply and map statistics
    --> 237         layers.compute_statistic(layout)
        238         layers.map_statistic(self)
        239 

    /Users/anaconda/lib/python2.7/site-packages/plotnine/layer.pyc in compute_statistic(self, layout)
         92     def compute_statistic(self, layout):
         93         for l in self:
    ---> 94             l.compute_statistic(layout)
         95 
         96     def map_statistic(self, plot):

    /Users/anaconda/lib/python2.7/site-packages/plotnine/layer.pyc in compute_statistic(self, layout)
        369         data = self.stat.use_defaults(data)
        370         data = self.stat.setup_data(data)
    --> 371         data = self.stat.compute_layer(data, params, layout)
        372         self.data = data
        373 

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc in compute_layer(cls, data, params, layout)
        194             return cls.compute_panel(pdata, pscales, **params)
        195 
    --> 196         return groupby_apply(data, 'PANEL', fn)
        197 
        198     @classmethod

    /Users/anaconda/lib/python2.7/site-packages/plotnine/utils.pyc in groupby_apply(df, cols, func, *args, **kwargs)
        615         # do not mark d as a slice of df i.e no SettingWithCopyWarning
        616         d.is_copy = None
    --> 617         lst.append(func(d, *args, **kwargs))
        618     return pd.concat(lst, axis=axis, ignore_index=True)
        619 

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc in fn(pdata)
        192                 return pdata
        193             pscales = layout.get_scales(pdata['PANEL'].iat[0])
    --> 194             return cls.compute_panel(pdata, pscales, **params)
        195 
        196         return groupby_apply(data, 'PANEL', fn)

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc in compute_panel(cls, data, scales, **params)
        221         for _, old in data.groupby('group'):
        222             old.is_copy = None
    --> 223             new = cls.compute_group(old, scales, **params)
        224             unique = uniquecols(old)
        225             missing = unique.columns.difference(new.columns)

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat_bin.pyc in compute_group(cls, data, scales, **params)
        107         new_data = assign_bins(
        108             data['x'], breaks, data.get('weight'),
    --> 109             params['pad'], params['closed'])
        110         return new_data

    /Users/anaconda/lib/python2.7/site-packages/plotnine/stats/binning.pyc in assign_bins(x, breaks, weight, pad, closed)
        163     df = pd.DataFrame({'bin_idx': bin_idx, 'weight': weight})
        164     wftable = df.pivot_table(
    --> 165         'weight', index=['bin_idx'], aggfunc=np.sum)['weight']
        166 
        167     # Empty bins get no value in the computed frequency table.

    /Users/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
        601             result = self.index.get_value(self, key)
        602 
    --> 603             if not is_scalar(result):
        604                 if is_list_like(result) and not isinstance(result, Series):
        605 

    /Users/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_value(self, series, key)

    pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3557)()

    pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3240)()

    pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4363)()

    KeyError: 'weight'

回溯错误：
---------------------------------------------------------------------------
KeyError回溯（最近一次呼叫最后一次）
在（）
1 plot2=ggplot（sortedDf）+aes（x='TOPIC'）+geom_直方图（binwidth=3）
---->2打印绘图2
/Users/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in____repr_;（self）
82打印/显示绘图
83         """
--->84.自我绘制（）
85 plt.show（）
86返回“”%self.\uuuuuu散列值
/绘图中的用户/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc（self）
139#指定默认主题
140 self=deepcopy（self）
-->141自我构建（）
142
143#如果没有主题，则使用默认主题
/用户/anaconda/lib/python2.7/site-packages/plotnine/ggplot.pyc in_build（self）
235
236#应用和映射统计数据
-->237层。计算统计（布局）
238层。地图统计（自）
239
/compute_statistic中的Users/anaconda/lib/python2.7/site-packages/plotnine/layer.pyc（self，layout）
92 def计算统计（自身、布局）：
93对于自我中的l：
--->94 l.compute_统计（布局）
95
96 def map_统计（自身、绘图）：
/compute_statistic中的Users/anaconda/lib/python2.7/site-packages/plotnine/layer.pyc（self，layout）
369数据=self.stat.use\u默认值（数据）
370数据=自统计设置数据（数据）
-->371 data=self.stat.compute_层（数据、参数、布局）
372 self.data=数据
373
/计算层中的Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc（cls、数据、参数、布局）
194返回cls.compute_面板（pdata、pscales、**参数）
195
-->196按应用返回分组（数据“面板”，fn）
197
198@classmethod
/groupby_apply中的Users/anaconda/lib/python2.7/site-packages/plotnine/utils.pyc（df、cols、func、*args、**kwargs）
615#不要将d标记为df的一部分，即不使用COPYWARNING设置
616 d.is_copy=无
-->617第一个附加（func（d，*args，**kwargs））
618返回pd.concat（lst，轴=轴，忽略索引=真）
619
/fn（pdata）中的Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc
192返回数据
193 pscales=布局。获取刻度（pdata['PANEL'].iat[0]）
-->194返回cls.compute_面板（pdata、pscales、**参数）
195
196按应用返回分组（数据“面板”，fn）
/计算面板中的Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat.pyc（cls、数据、比例、**参数）
221对于_，在数据中是旧的。groupby（'group'）：
222 old.is_copy=无
-->223新=cls.compute_组（旧的，比例，**参数）
224唯一=唯一（旧）
225缺失=唯一的.columns.difference（新的.columns）
/compute_组中的Users/anaconda/lib/python2.7/site-packages/plotnine/stats/stat_bin.pyc（cls、数据、比例、**参数）
107新的\u数据=分配\u箱(
108数据['x'，中断，数据.get（'weight'），
-->109参数['pad']，参数['closed']）
110返回新的_数据
/分配箱中的Users/anaconda/lib/python2.7/site-packages/plotnine/stats/binning.pyc（x、断裂、重量、垫、闭合）
163 df=pd.DataFrame（{'bin_idx'：bin_idx，'weight'：weight}）
164 wftable=df.pivot\U表格(
-->165“权重”，索引=['bin_idx'，aggfunc=np.sum]['weight']
166
167#空箱子在计算的频率表中没有值。
/Users/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in___getitem__（self，key）
601结果=self.index.get_值（self，key）
602
-->603如果不是标量（结果）：
604如果是列表（结果）而不是实例（结果，系列）：
605
/get_值中的Users/anaconda/lib/python2.7/site-packages/pandas/index/base.pyc（self、series、key）
pandas.index.IndexEngine.get_值中的pandas/index.pyx（pandas/index.c:3557）（）
pandas.index.IndexEngine.get_值中的pandas/index.pyx（pandas/index.c:3240）（）
pandas/index.pyx在pandas.index.IndexEngine.get_loc（pandas/index.c:4363）（）
关键错误：“重量”
像在R中那样在ggplot中嵌套aes可以解决您的问题：
plot1 = ggplot(sortedDf, aes(x='TOPIC')) + geom_histogram(binwidth=3)

在ggplot中嵌套aes，就像在R中那样，可以解决您的问题：
plot1 = ggplot(sortedDf, aes(x='TOPIC')) + geom_histogram(binwidth=3)

答案没有解释，可能解决问题，也可能无法解决问题，但添加解释可能有助于OP了解这如何解决问题。答案没有解释，可能解决问题，也可能无法解决问题，但添加解释可能有助于OP了解这如何解决问题。