Pandas 按索引对数据帧行分组_Pandas_Python 2.7

Pandas 按索引对数据帧行分组

pandas python-2.7

Pandas 按索引对数据帧行分组,pandas,python-2.7,Pandas,Python 2.7,我有一个如下所示的数据帧： index col1 col2 1 'A' 'B' 300 'A' 'B' 301 'A' 'B' 400 'A' 'B' 510 'A' 'B' 511 'C' 'D' 512 'E' 'F' 10

我有一个如下所示的数据帧：

index      col1     col2
       1         'A'    'B' 
       300       'A'    'B' 
       301       'A'    'B' 
       400       'A'    'B' 
       510       'A'    'B' 
       511       'C'    'D' 
       512       'E'    'F'
       1000      'Q'    'P'
       1001      'Q'    'R'

index      col1     col2
   1         'A'    'B' 
   300, 3001       'A'    'B'
   400       'A'    'B' 
   510, 511, 512      ['A', 'C', 'E']    ['B', 'D', 'F']
   1000, 1001         'Q'   ['P', 'R']

这是另一个数据帧的切片。我需要对所有具有连续索引的行进行分组，例如300和301 如果这些值不同，我需要对它们进行分组，如下所示：

index      col1     col2
       1         'A'    'B' 
       300       'A'    'B' 
       301       'A'    'B' 
       400       'A'    'B' 
       510       'A'    'B' 
       511       'C'    'D' 
       512       'E'    'F'
       1000      'Q'    'P'
       1001      'Q'    'R'

index      col1     col2
   1         'A'    'B' 
   300, 3001       'A'    'B'
   400       'A'    'B' 
   510, 511, 512      ['A', 'C', 'E']    ['B', 'D', 'F']
   1000, 1001         'Q'   ['P', 'R']

所以在

300和301

的情况下，值是相同的，所以我只保留它们，但在

510、511、512

的情况下，值是不同的，所以我必须列出它们，对于

1000和1001

来说，col1的值是相同的，所以我保留它们，但是col2的值不同，所以我列出它们

非常感谢您的帮助，谢谢

使用：

#convert index to column if necessary
df = df.reset_index()

#remove duplicates with sets and if length is 1 add scalar
f = lambda x: list(set(x)) if len(set(x)) > 1 else x.iat[0]
#for index column use join with cast to strings
d = {'index': lambda x: ', '.join(x.astype(str)), 'col1':f, 'col2':f }
#create consecutive groups
g = df['index'].astype(str).str[0]
s = g.ne(g.shift()).cumsum()
#aggregtae by fisrt value of `index` column with dictionary
df = df.groupby(s).agg(d).reset_index(drop=True)
print (df)
           index             col1             col2
0              1              'A'              'B'
1       300, 301              'A'              'B'
2            400              'A'              'B'
3  510, 511, 512  ['C', 'A', 'E']  ['D', 'B', 'F']
4     1000, 1001              'Q'       ['R', 'P']

谢谢@jezrael我正在尝试您的解决方案，但我的question@RodwanBakkar-数据帧中的列数是多少？修改答案后，我得到

1110001001[Q，R，B][A，Q]

在实际情况下，我的意思是列数是2