Python 熊猫群比与迪克特_Python_Pandas

Python 熊猫群比与迪克特

python pandas

Python 熊猫群比与迪克特,python,pandas,Python,Pandas,可以使用dict对列的元素进行分组吗例如： In [3]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], ...: 'B' : np.random.randn(8)}) In [4]: df Out[4]: A B 0 one 0.751612 1 one 0.333008 2 two 0.

可以使用dict对列的元素进行分组吗

例如：

In [3]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
   ...:          'B' : np.random.randn(8)})
In [4]: df
Out[4]: 
       A         B
0    one  0.751612
1    one  0.333008
2    two  0.395667
3  three  1.636125
4    two  0.916435
5    two  1.076679
6    one -0.992324
7  three -0.593476

In [5]: d = {'one':'Start', 'two':'Start', 'three':'End'}
In [6]: grouped = df[['A','B']].groupby(d)

此（以及其他变体）返回一个空的groupby对象。而我在使用

.apply

时的变化也都失败了

我希望将列

的值与字典的键相匹配，并将行放入由值定义的组中。输出如下所示：

 Start:
           A         B
    0    one  0.751612
    1    one  0.333008
    2    two  0.395667
    4    two  0.916435
    5    two  1.076679
    6    one -0.992324
End:
           A         B
    3  three  1.636125
    7  three -0.593476

从中，dict必须从标签映射到组名，因此如果将

'A'

放入索引中，这将起作用：

grouped2 = df.set_index('A').groupby(d)
for group_name, data in grouped2:
    print group_name
    print '---------'
    print data

# Output:
End
---------
              B
A              
three -1.234795
three  0.239209

Start
---------
            B
A            
one -1.924156
one  0.506046
two -1.681980
two  0.605248
two -0.861364
one  0.800431

列名和行索引都是标签，而在将

'A'

放入索引之前，

'A'

的元素都是值

如果索引中有其他信息使得执行

set\u index（）

变得棘手，您可以使用

map（）

创建一个分组列：

可以使用字典进行分组，但（与任何GROUPBY操作一样）需要先设置索引列

grouped = df.set_index("A").groupby(d)

list(grouped)
# [('End',               B
# A              
# three -1.550727
# three  1.048730
# 
# [2 rows x 1 columns]), ('Start',             B
# A            
# one -1.552152
# one -2.018647
# two -0.968068
# two  0.449016
# two -0.374453
# one  0.116770
# 
# [6 rows x 1 columns])]

多亏了（还有马吕斯）——所以只有df列可以自己分组？（无意中输入）索引需要保留——这是一个5分钟的时间间隔（超过13年）。因此，如果我想要如上所述的（1，2）和（3）分组，合适的选择是使用该映射创建一个新列，然后在该新列上分组？@ChristopherShort:如果你不能真正更改索引，那么最好创建一个分组列，请参阅我编辑的回答谢谢-也请欣赏快速制作新专栏的技巧

grouped = df.set_index("A").groupby(d)

list(grouped)
# [('End',               B
# A              
# three -1.550727
# three  1.048730
# 
# [2 rows x 1 columns]), ('Start',             B
# A            
# one -1.552152
# one -2.018647
# two -0.968068
# two  0.449016
# two -0.374453
# one  0.116770
# 
# [6 rows x 1 columns])]