Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将在数据帧上执行groupby操作后获得的数据结构转换为数据帧?_Python_Pandas_Dataframe_Indexing_Group By - Fatal编程技术网

Python 如何将在数据帧上执行groupby操作后获得的数据结构转换为数据帧?

Python 如何将在数据帧上执行groupby操作后获得的数据结构转换为数据帧?,python,pandas,dataframe,indexing,group-by,Python,Pandas,Dataframe,Indexing,Group By,假设我拥有示例中的数据集: 我想做一个团与预测试分数的方框图。为此,我需要找出这两个变量的相对分布。因此,我将团分组为预测试分数: df1 = df['regiment'].groupby(df['preTestScore']).count() df1 preTestScore 2 3 3 3 4 2 24 2 31 2 Name: regiment, dtype: int64 如果我现在尝试进行箱线图,它会给出一个错误: import seaborn

假设我拥有示例中的数据集:

我想做一个
预测试分数
的方框图。为此,我需要找出这两个变量的相对分布。因此,我将
分组为
预测试分数

df1 = df['regiment'].groupby(df['preTestScore']).count()
df1

preTestScore
2     3
3     3
4     2
24    2
31    2
Name: regiment, dtype: int64
如果我现在尝试进行箱线图,它会给出一个错误:

import seaborn as sns
sns.boxplot(data=df1)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-131-8296ca940a25> in <module>()
      1 df1 = df['regiment'].groupby(df['preTestScore']).count()
      2 df1
----> 3 sns.boxplot(data=df1)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, notch, ax, **kwargs)
   2209     plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
   2210                           orient, color, palette, saturation,
-> 2211                           width, dodge, fliersize, linewidth)
   2212 
   2213     if ax is None:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
    439                  width, dodge, fliersize, linewidth):
    440 
--> 441         self.establish_variables(x, y, hue, data, orient, order, hue_order)
    442         self.establish_colors(color, palette, saturation)
    443 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
     94                 if hasattr(data, "shape"):
     95                     if len(data.shape) == 1:
---> 96                         if np.isscalar(data[0]):
     97                             plot_data = [data]
     98                         else:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    765         key = com._apply_if_callable(key, self)
    766         try:
--> 767             result = self.index.get_value(self, key)
    768 
    769             if not is_scalar(result):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   3116         try:
   3117             return self._engine.get_value(s, k,
-> 3118                                           tz=getattr(series.dtype, 'tz', None))
   3119         except KeyError as e1:
   3120             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

这产生了一个箱线图,但其分布不是
军团
测试前得分
的分布(事实上,这个箱线图对我来说没有意义;我不知道它的
y
轴值代表什么)。为此,我们需要在箱线图中指定
x
y
参数。但是,由于groupby对象不是数据帧,因此会产生以下错误:

sns.boxplot(x='regiment', y='preTestScore', data=df1)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-132-fc8036eb7d0b> in <module>()
----> 1 sns.boxplot(x='regiment', y='preTestScore', data=df1)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, notch, ax, **kwargs)
   2209     plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
   2210                           orient, color, palette, saturation,
-> 2211                           width, dodge, fliersize, linewidth)
   2212 
   2213     if ax is None:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
    439                  width, dodge, fliersize, linewidth):
    440 
--> 441         self.establish_variables(x, y, hue, data, orient, order, hue_order)
    442         self.establish_colors(color, palette, saturation)
    443 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
    149                 if isinstance(input, string_types):
    150                     err = "Could not interpret input '{}'".format(input)
--> 151                     raise ValueError(err)
    152 
    153             # Figure out the plotting orientation

ValueError: Could not interpret input 'regiment'
当我获取
df1
中的值并将它们放入新的数据帧
df2
,然后再次尝试箱线图时,它会工作:

df2 = pd.DataFrame({'preTestScore': [2,3,4,24,31], 'regiment': [3,3,2,2,2]})
df2


因此,与其复制groupby对象的内容并将其粘贴到新的数据框中,如何直接获取数据框来存储数据框中两个变量的相对分布?

使用
到\u frame
将序列转换为数据框,然后在打印前重置索引:

df1 = df['regiment'].groupby(df['preTestScore']).count().to_frame().reset_index()
sns.boxplot(x='regiment', y='preTestScore', data=df1)

不相关,但您能否告诉我如何使用固定数量的垃圾箱来使用
groupby
?这在以下情况下会很有帮助,例如,
community
具有>10个值,并且我们希望将数据帧分组到3个bins
community
值中。@Kristada673:只需创建一个包含已装箱值的新列并按该列分组即可。请参见此处有关如何进行装箱的示例:
sns.boxplot(x='regiment', y='preTestScore', data=df1)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-132-fc8036eb7d0b> in <module>()
----> 1 sns.boxplot(x='regiment', y='preTestScore', data=df1)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, notch, ax, **kwargs)
   2209     plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
   2210                           orient, color, palette, saturation,
-> 2211                           width, dodge, fliersize, linewidth)
   2212 
   2213     if ax is None:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
    439                  width, dodge, fliersize, linewidth):
    440 
--> 441         self.establish_variables(x, y, hue, data, orient, order, hue_order)
    442         self.establish_colors(color, palette, saturation)
    443 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
    149                 if isinstance(input, string_types):
    150                     err = "Could not interpret input '{}'".format(input)
--> 151                     raise ValueError(err)
    152 
    153             # Figure out the plotting orientation

ValueError: Could not interpret input 'regiment'
df1.dtype
>>> dtype('int64')
df2 = pd.DataFrame({'preTestScore': [2,3,4,24,31], 'regiment': [3,3,2,2,2]})
df2
sns.boxplot(x='regiment', y='preTestScore', data=df2)
df1 = df['regiment'].groupby(df['preTestScore']).count().to_frame().reset_index()
sns.boxplot(x='regiment', y='preTestScore', data=df1)