Python 熊猫:如何根据频率有效过滤数据
我有一个包含三列的数据框,follower、user和ratio 对于u中的每个唯一元素,我想知道它发生了多少次,并删除与出现次数少于5次的元素对应的行。这是我的代码,效率很低。我想知道如何正确地书写它Python 熊猫:如何根据频率有效过滤数据,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个包含三列的数据框,follower、user和ratio 对于u中的每个唯一元素,我想知道它发生了多少次,并删除与出现次数少于5次的元素对应的行。这是我的代码,效率很低。我想知道如何正确地书写它 known_follower_id= np.unique(following_df.follower.values) # IDs of members of the list in the saved database userid, counts = np.unique(following_d
known_follower_id= np.unique(following_df.follower.values) # IDs of members of the list in the saved database
userid, counts = np.unique(following_df.user.values, return_counts= True) # ID of people they followed in the saved database
count_idx=np.argsort(-counts) # number of times a user was followed
trimmed_following_df= following_df.copy(deep= True)
th = 5
idx_th = counts< th
userid_removed = userid[idx_th]
idx_userid_rem= [i for i,v in enumerate(trimmed_following_df.user.values) if v in userid_removed]
trimmed_following_df=trimmed_following_df.drop(idx_userid_rem)
known_follower_id=np.unique(跟随_df.follower.values)#已保存数据库中列表成员的id
userid,counts=np.unique(遵循_df.user.values,返回_counts=True)#他们在保存的数据库中跟踪的人的ID
count_idx=np.argsort(-counts)#跟踪用户的次数
修剪后的\u following\u df=following\u df.copy(deep=True)
th=5
idx_th=计数
以下是我将如何替代数据。看看这是否能帮你得到你想要的
import pandas as pd
df = pd.DataFrame({
'follower': ['John', 'Jane', 'Jack', 'Suzy', 'Kate', 'Mark', 'Alex', 'Boby', 'Cris', 'Duke'],
'user': ['milk', 'milk', 'milk', 'milk', 'milk', 'milk', 'pear', 'pear', 'wire', 'silk'],
'ratio': [.4, .4, .5, .2, .6, .3, .5, .8, .9, .2]})
print (df)
df['usercount'] = df['user'].map(df['user'].value_counts())
df = df[df['usercount'] > 5]
df.drop(columns=['usercount'],inplace=True)
print (df)
其输出将为:
原始数据帧:
follower user ratio
0 John milk 0.4
1 Jane milk 0.4
2 Jack milk 0.5
3 Suzy milk 0.2
4 Kate milk 0.6
5 Mark milk 0.3
6 Alex pear 0.5
7 Boby pear 0.8
8 Cris wire 0.9
9 Duke silk 0.2
follower user ratio
0 John milk 0.4
1 Jane milk 0.4
2 Jack milk 0.5
3 Suzy milk 0.2
4 Kate milk 0.6
5 Mark milk 0.3
更新的数据帧:
follower user ratio
0 John milk 0.4
1 Jane milk 0.4
2 Jack milk 0.5
3 Suzy milk 0.2
4 Kate milk 0.6
5 Mark milk 0.3
6 Alex pear 0.5
7 Boby pear 0.8
8 Cris wire 0.9
9 Duke silk 0.2
follower user ratio
0 John milk 0.4
1 Jane milk 0.4
2 Jack milk 0.5
3 Suzy milk 0.2
4 Kate milk 0.6
5 Mark milk 0.3
请您发布一个示例数据集和所需的输出。我想我明白了这个问题,但我想确定输出是否符合要求。