Python 重命名'；s小于模式计数的0.5%，值_计数（）_Python_Pandas_Dataframe

Python 重命名'；s小于模式计数的0.5%，值_计数（）

python pandas dataframe

Python 重命名'；s小于模式计数的0.5%，值_计数（）,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个非常大的df，很多行和列。如果分类变量的类别小于模式计数的0.5%，我想将其重命名为“other” 我知道df[colname].value\u计数（normalize=True）给出了所有类别的分布。如何提取小于模式0.5%的模式，以及如何将其重命名为其他模式 apple large 100 medium 50 small 3 desired output apple large 100 medium 50 other 3 使用with和compre by less

我有一个非常大的df，很多行和列。如果分类变量的类别小于模式计数的0.5%，我想将其重命名为“other”

我知道

df[colname].value\u计数（normalize=True）

给出了所有类别的分布。如何提取小于模式0.5%的模式，以及如何将其重命名为其他模式

  apple
large 100
medium 50
small  3

desired output

  apple
large 100
medium 50
other  3

使用with和compre by less by与原始列的大小相同，因此在中设置新值：

计数：

s = df['apple'].value_counts()
print (s)
large     100
medium     50
other       3
Name: apple, dtype: int64

使用with和compre by less by与原始列的大小相同，因此在中设置新值：

计数：

s = df['apple'].value_counts()
print (s)
large     100
medium     50
other       3
Name: apple, dtype: int64

首先，您需要通过值计数和索引查找频率小于0.5%的值。
其次，您需要制作一个索引键为索引且值为“其他”的字典。
第三，使用“替换为字典”将值更改为其他值

这里有一个例子

import pandas as pd
df = pd.DataFrame({"apple":["large"] * 1000 + ["medium"] * 500 + ["small"] * 1})

cond = df['apple'].value_counts(normalize = True) < 0.005
others = cond[cond].index
others_dict = {k:"others" for k in others}

df['apple'] = df['apple'].replace(others_dict)

将熊猫作为pd导入
df=pd.DataFrame（{“苹果”：[“大”]*1000+[“中”]*500+[“小”]*1}）
cond=df['apple'].数值计数（normalize=True）<0.005
其他=cond[cond]。索引
others_dict={k:“others”代表others中的k}
df['apple']=df['apple'].替换（其他）

首先，您需要通过值计数和索引查找频率小于0.5%的值。
其次，您需要制作一个键为索引且值为“其他”的字典。
第三，使用替换为字典将值更改为其他值

这里有一个例子

import pandas as pd
df = pd.DataFrame({"apple":["large"] * 1000 + ["medium"] * 500 + ["small"] * 1})

cond = df['apple'].value_counts(normalize = True) < 0.005
others = cond[cond].index
others_dict = {k:"others" for k in others}

df['apple'] = df['apple'].replace(others_dict)

将熊猫作为pd导入
df=pd.DataFrame（{“苹果”：[“大”]*1000+[“中”]*500+[“小”]*1}）
cond=df['apple'].数值计数（normalize=True）<0.005
其他=cond[cond]。索引
others_dict={k:“others”代表others中的k}
df['apple']=df['apple'].替换（其他）