Python 如何将分组结果转换为数据帧_Python_Pandas

Python 如何将分组结果转换为数据帧

python pandas

Python 如何将分组结果转换为数据帧,python,pandas,Python,Pandas,我有以下数据框： import pandas as pd import numpy as np df = pd.DataFrame({ 'category': ['ctr','ctr','ctr','ctr','ctr','ctr'], 'expected_count': [100,100,112,1.3,14,125], 'sample_id': ['S1','S1','S1','S2','S2','S2

我有以下数据框：

import pandas as pd
import numpy as np
df = pd.DataFrame({
               'category': ['ctr','ctr','ctr','ctr','ctr','ctr'],
               'expected_count': [100,100,112,1.3,14,125],
               'sample_id': ['S1','S1','S1','S2','S2','S2'],
               'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c'],
               })

这就产生了：

In [2]: df
Out[2]:
  category  expected_count gene_symbol sample_id
0      ctr           100.0           a        S1
1      ctr           100.0           b        S1
2      ctr           112.0           c        S1
3      ctr             1.3           a        S2
4      ctr            14.0           b        S2
5      ctr           125.0           c        S2

我可以用基因符号将其分组：

In [4]: gdf = df.groupby(by = 'gene_symbol')['expected_count'].mean()
   ...: gdf
   ...:
Out[4]:
gene_symbol
a     50.65
b     57.00
c    118.50
Name: expected_count, dtype: float64

In [5]: str(gdf)
Out[5]: 'gene_symbol\na     50.65\nb     57.00\nc    118.50\nName: expected_count, dtype: float64'

请注意，

gdf

是一个字符串。如何将其转换为数据帧？

需要

作为_index=False

或：

输出不是

字符串

，而是

系列

：

print (type(df.groupby('gene_symbol')['expected_count'].mean()))
<class 'pandas.core.series.Series'>

print（类型（df.groupby（'gene_symbol'）['expected_count'].mean（））

您可以使用：

gdf = df.groupby(by = 'gene_symbol')['expected_count'].mean().to_frame()

gdf
Out[149]: 
             expected_count
gene_symbol                
a                     50.65
b                     57.00
c                    118.50

print (type(df.groupby('gene_symbol')['expected_count'].mean()))
<class 'pandas.core.series.Series'>

gdf = df.groupby(by = 'gene_symbol')['expected_count'].mean().to_frame()

gdf
Out[149]: 
             expected_count
gene_symbol                
a                     50.65
b                     57.00
c                    118.50