Python 2.7 如何在数据框中按行拆分或重新排序数据框
我只想清理数据帧并分析数据帧。然而,我遇到了麻烦。我创建了一个简单的数据框架来说明它:Python 2.7 如何在数据框中按行拆分或重新排序数据框,python-2.7,pandas,dataframe,split,Python 2.7,Pandas,Dataframe,Split,我只想清理数据帧并分析数据帧。然而,我遇到了麻烦。我创建了一个简单的数据框架来说明它: import pandas as pd d = {'Resutls': ['IIL', 'pass','pass','IIH','pass','IIL','pass'], 'part':['None',1,2,'None',5,'None',4] } df = pd.DataFrame(d) 结果如下: Resutls part 0 IIL None 1 pass 1
import pandas as pd
d = {'Resutls': ['IIL', 'pass','pass','IIH','pass','IIL','pass'], 'part':['None',1,2,'None',5,'None',4] }
df = pd.DataFrame(d)
结果如下:
Resutls part
0 IIL None
1 pass 1
2 pass 2
3 IIH None
4 pass 5
5 IIL None
6 pass 4
数据帧中有一些可重复的模块。我只想按行对数据帧重新排序,并删除重复的数据帧,如下所示:
Resutls part
0 IIL None
1 pass 1
2 pass 2
6 pass 4
3 IIH None
4 pass 5
或者将数据帧拆分为几个子数据帧:
Resutls part
0 IIL None
1 pass 1
2 pass 2
3 pass 4
Resutls part
0 IIH None
1 pass 5
这只是我想做的一个简单的例子。实际上,我有一个4000000行的数据帧,我尝试使用reindex或df.iloc来实现这一点。这是直观的
对我来说,这似乎有点复杂。有什么好办法吗?请告知。我认为您需要将
传递给NaN
s并使用正向填充,然后按iloc
排序和重新排序:
df = df.iloc[df['Resutls'].mask(df['Resutls'].eq('pass')).ffill().argsort()]
print (df)
Resutls part
3 IIH None
4 pass 5
0 IIL None
1 pass 1
2 pass 2
5 IIL None
6 pass 4
最后通过以下方式删除重复行:
如果要单独使用每个数据帧:
df['g'] = df['Resutls'].mask(df['Resutls'].eq('pass')).ffill()
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
Resutls part g
0 IIL None IIL
1 pass 1 IIL
2 pass 2 IIL
3 IIH None IIH
4 pass 5 IIH
6 pass 4 IIL
dfs = {k:v.drop('g', axis=1) for k, v in df.groupby('g')}
#print (dfs)
print (dfs['IIH'])
Resutls part
3 IIH None
4 pass 5
print (dfs['IIL'])
Resutls part
0 IIL None
1 pass 1
2 pass 2
6 pass 4
非常感谢您的快速回复。我会尝试的,我学到了这一点。
df['g'] = df['Resutls'].mask(df['Resutls'].eq('pass')).ffill()
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
Resutls part g
0 IIL None IIL
1 pass 1 IIL
2 pass 2 IIL
3 IIH None IIH
4 pass 5 IIH
6 pass 4 IIL
dfs = {k:v.drop('g', axis=1) for k, v in df.groupby('g')}
#print (dfs)
print (dfs['IIH'])
Resutls part
3 IIH None
4 pass 5
print (dfs['IIL'])
Resutls part
0 IIL None
1 pass 1
2 pass 2
6 pass 4