Python 数据帧频率
我有这个数据框:Python 数据帧频率,python,pandas,Python,Pandas,我有这个数据框: source target 0 ape dog 1 ape hous 2 dog hous 3 hors dog 4 hors ape 5 dog ape 6 ape bird 7 ape hous 8 bird hous 9 bird fist 10 bird ape 11 fist ape 我正在尝试使用以下代码生成频率计数: df_
source target
0 ape dog
1 ape hous
2 dog hous
3 hors dog
4 hors ape
5 dog ape
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 bird ape
11 fist ape
我正在尝试使用以下代码生成频率计数:
df_count =df.groupby(['source', 'target']).size().reset_index().sort_values(0, ascending=False)
df_count.columns = ['source', 'target', 'weight']
我得到下面的结果
source target weight
2 ape hous 2
0 ape bird 1
1 ape dog 1
3 bird ape 1
4 bird fist 1
5 bird hous 1
6 dog ape 1
7 dog hous 1
8 fist ape 1
9 hors ape 1
10 hors dog 1
我如何修改代码以使方向无关紧要,即,我得到的不是
ape bird 1
和bird ape 1
,而是ape bird 2
?您可以首先按应用
按行排序,然后将参数名称添加到:
首先按行对值进行排序
In [31]: df
Out[31]:
source target
0 ape dog
1 ape hous
2 dog hous
3 hors dog
4 hors ape
5 dog ape
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 bird ape
11 fist ape
In [32]: df.values.sort()
In [33]: df
Out[33]:
source target
0 ape dog
1 ape hous
2 dog hous
3 dog hors
4 ape hors
5 ape dog
6 ape bird
7 ape hous
8 bird hous
9 bird fist
10 ape bird
11 ape fist
然后,groupby
在source,target
上,按大小进行聚合,对结果进行排序
In [34]: df.groupby(['source', 'target']).size().sort_values(ascending=False)
...: .reset_index(name='weight')
Out[34]:
source target weight
0 ape hous 2
1 ape dog 2
2 ape bird 2
3 dog hous 1
4 dog hors 1
5 bird hous 1
6 bird fist 1
7 ape hors 1
8 ape fist 1
最简单的方法是对行进行排序,以便只出现一个顺序。
In [34]: df.groupby(['source', 'target']).size().sort_values(ascending=False)
...: .reset_index(name='weight')
Out[34]:
source target weight
0 ape hous 2
1 ape dog 2
2 ape bird 2
3 dog hous 1
4 dog hors 1
5 bird hous 1
6 bird fist 1
7 ape hors 1
8 ape fist 1