Python 2.7 如何在数据框中按行拆分或重新排序数据框

Python 2.7 如何在数据框中按行拆分或重新排序数据框,python-2.7,pandas,dataframe,split,Python 2.7,Pandas,Dataframe,Split,我只想清理数据帧并分析数据帧。然而,我遇到了麻烦。我创建了一个简单的数据框架来说明它: import pandas as pd d = {'Resutls': ['IIL', 'pass','pass','IIH','pass','IIL','pass'], 'part':['None',1,2,'None',5,'None',4] } df = pd.DataFrame(d) 结果如下: Resutls part 0 IIL None 1 pass 1

我只想清理数据帧并分析数据帧。然而,我遇到了麻烦。我创建了一个简单的数据框架来说明它:

import pandas as pd
d = {'Resutls': ['IIL', 'pass','pass','IIH','pass','IIL','pass'], 'part':['None',1,2,'None',5,'None',4] }
df = pd.DataFrame(d)
结果如下:

    Resutls  part
0     IIL    None
1    pass      1
2    pass      2
3     IIH    None
4    pass      5
5     IIL    None
6    pass      4
数据帧中有一些可重复的模块。我只想按行对数据帧重新排序,并删除重复的数据帧,如下所示:

    Resutls  part
0     IIL    None
1    pass      1
2    pass      2
6    pass      4 
3     IIH    None
4    pass      5
或者将数据帧拆分为几个子数据帧:

    Resutls  part
0     IIL    None
1    pass      1
2    pass      2
3    pass      4 

    Resutls  part
0     IIH    None
1    pass      5
这只是我想做的一个简单的例子。实际上,我有一个4000000行的数据帧,我尝试使用reindex或df.iloc来实现这一点。这是直观的
对我来说,这似乎有点复杂。有什么好办法吗?请告知。

我认为您需要将
传递给
NaN
s并使用正向填充,然后按
iloc
排序和重新排序:

df = df.iloc[df['Resutls'].mask(df['Resutls'].eq('pass')).ffill().argsort()]
print (df)
  Resutls  part
3     IIH  None
4    pass     5
0     IIL  None
1    pass     1
2    pass     2
5     IIL  None
6    pass     4
最后通过以下方式删除重复行:

如果要单独使用每个数据帧:

df['g'] = df['Resutls'].mask(df['Resutls'].eq('pass')).ffill()
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
  Resutls  part    g
0     IIL  None  IIL
1    pass     1  IIL
2    pass     2  IIL
3     IIH  None  IIH
4    pass     5  IIH
6    pass     4  IIL

dfs = {k:v.drop('g', axis=1) for k, v in df.groupby('g')}
#print (dfs)

print (dfs['IIH'])
  Resutls  part
3     IIH  None
4    pass     5

print (dfs['IIL'])
  Resutls  part
0     IIL  None
1    pass     1
2    pass     2
6    pass     4

非常感谢您的快速回复。我会尝试的,我学到了这一点。
df['g'] = df['Resutls'].mask(df['Resutls'].eq('pass')).ffill()
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
  Resutls  part    g
0     IIL  None  IIL
1    pass     1  IIL
2    pass     2  IIL
3     IIH  None  IIH
4    pass     5  IIH
6    pass     4  IIL

dfs = {k:v.drop('g', axis=1) for k, v in df.groupby('g')}
#print (dfs)

print (dfs['IIH'])
  Resutls  part
3     IIH  None
4    pass     5

print (dfs['IIL'])
  Resutls  part
0     IIL  None
1    pass     1
2    pass     2
6    pass     4