Python 如果数据帧的计数很小，如何从中删除某些行_Python_Pandas_Dataframe

Python 如果数据帧的计数很小，如何从中删除某些行

python pandas dataframe

Python 如果数据帧的计数很小，如何从中删除某些行,python,pandas,dataframe,Python,Pandas,Dataframe,我想从熊猫数据框中删除一些数据。我有一个像这样的数据帧： sex age race c_charge_desc Male 0.204082 Hispanic Felony Battery (Dom Strang) Male 0.122449 African-American Felony Driving While Lic Suspend Female 0.163265 Africa

我想从熊猫数据框中删除一些数据。我有一个像这样的数据帧：

sex     age         race                c_charge_desc
Male    0.204082    Hispanic            Felony Battery (Dom Strang)
Male    0.122449    African-American    Felony Driving While Lic Suspend
Female  0.163265    African-American    Neglect Child / No Bodily Harm
Male    0.081633    African-American    arrest case no charge
Male    0.530612    African-American    Felony Driving While Lic Suspend

有一个名为c_charge_desc的列，其中包含许多不同的费用描述。我想删除一些总数量小于threashold的费用说明

Battery                            924
arrest case no charge              904
Possession of Cocaine              378
Grand Theft in the 3rd Degree      352
Driving While License Revoked      158
                                  ... 
Compulsory Attendance Violation      1
Possession Of Clonazepam             1
Possession Of Anabolic Steroid       1
Attempt Burglary (Struct)            1
Fail To Redeliver Hire Prop          1
Name: c_charge_desc, Length: 387, dtype: int64

这是本专栏的摘要，您可以看到有许多描述，其中有1个数字。我想删除那些出现总数小于10的描述

我试过了

df[df['c_charge_desc'].value_counts() < 10]

这将是我的预期输出

Battery                            924
arrest case no charge              904
Possession of Cocaine              378
Grand Theft in the 3rd Degree      352
Driving While License Revoked      158
...
Some charges                        10
Some charges                        10
Some charges                        10
Some charges                        10
Name: c_charge_desc, Length: 200, dtype: int64

A可能是最简洁的。例如，要仅保留多次发生的费用，请执行以下操作：

df.groupby（'c_charge_desc'）。过滤器（lambda组：len（组）>1）
#性别年龄种族费用描述
#1名男性，0.122449非裔美国人在驾照暂停时驾驶重罪
#4名男性0.530612非裔美国人在驾照暂停时驾驶重罪

或者，创建一个临时

计数器

列以用作筛选器：

df['counter']=df.groupby（'c_charge_desc'）。c_charge_desc.transform（'count'））
df[df.counter>1]。下降（'counter'，轴=1）

显示您的预期输出。^以上的评论是肯定的，但我认为您最好将“费用说明”列拆分为两个单独的列，并用数字列屏蔽您的数据框。@luthervespers是的，我会这样做，但我需要消除那些计数太少的项目。@luthervespers！非常感谢。

Battery                            924
arrest case no charge              904
Possession of Cocaine              378
Grand Theft in the 3rd Degree      352
Driving While License Revoked      158
...
Some charges                        10
Some charges                        10
Some charges                        10
Some charges                        10
Name: c_charge_desc, Length: 200, dtype: int64