Pandas 用概率填充缺失值

Pandas 用概率填充缺失值,pandas,Pandas,假设给一个列水果,我有57个香蕉,54个苹果和其他空值。 现在我想用fillna填充空值,其中57/(57+54)概率作为香蕉,54/(57+54)概率作为苹果,我应该怎么做 Fruit ------ None Banana Fruit Banana ....(with 57 banana, 54 apple, 10 None) 设置 fruit = pd.Series(['banana'] * 57 + ['apple'] * 54 + [None] * 10, name='fruit')

假设给一个列水果,我有57个香蕉,54个苹果和其他空值。 现在我想用
fillna
填充空值,其中57/(57+54)概率作为香蕉,54/(57+54)概率作为苹果,我应该怎么做

Fruit
------
None
Banana
Fruit
Banana
....(with 57 banana, 54 apple, 10 None)
设置

fruit = pd.Series(['banana'] * 57 + ['apple'] * 54 + [None] * 10, name='fruit')
nullfruit = fruit.isnull()
fruit.loc[nullfruit] = fruit.dropna().sample(nullfruit.sum()).values
nullfruit = fruit.isnull().values
u, c = np.unique(fruit.values[~nullfruit], return_counts=1)

fruit.loc[nullfruit] = np.random.choice(u, nullfruit.sum(), p=c / c.sum())

使用
pd.Series.sample

fruit = pd.Series(['banana'] * 57 + ['apple'] * 54 + [None] * 10, name='fruit')
nullfruit = fruit.isnull()
fruit.loc[nullfruit] = fruit.dropna().sample(nullfruit.sum()).values
nullfruit = fruit.isnull().values
u, c = np.unique(fruit.values[~nullfruit], return_counts=1)

fruit.loc[nullfruit] = np.random.choice(u, nullfruit.sum(), p=c / c.sum())
使用
np.random.choice
np.unique

fruit = pd.Series(['banana'] * 57 + ['apple'] * 54 + [None] * 10, name='fruit')
nullfruit = fruit.isnull()
fruit.loc[nullfruit] = fruit.dropna().sample(nullfruit.sum()).values
nullfruit = fruit.isnull().values
u, c = np.unique(fruit.values[~nullfruit], return_counts=1)

fruit.loc[nullfruit] = np.random.choice(u, nullfruit.sum(), p=c / c.sum())
设置

fruit = pd.Series(['banana'] * 57 + ['apple'] * 54 + [None] * 10, name='fruit')
nullfruit = fruit.isnull()
fruit.loc[nullfruit] = fruit.dropna().sample(nullfruit.sum()).values
nullfruit = fruit.isnull().values
u, c = np.unique(fruit.values[~nullfruit], return_counts=1)

fruit.loc[nullfruit] = np.random.choice(u, nullfruit.sum(), p=c / c.sum())

使用
pd.Series.sample

fruit = pd.Series(['banana'] * 57 + ['apple'] * 54 + [None] * 10, name='fruit')
nullfruit = fruit.isnull()
fruit.loc[nullfruit] = fruit.dropna().sample(nullfruit.sum()).values
nullfruit = fruit.isnull().values
u, c = np.unique(fruit.values[~nullfruit], return_counts=1)

fruit.loc[nullfruit] = np.random.choice(u, nullfruit.sum(), p=c / c.sum())
使用
np.random.choice
np.unique

fruit = pd.Series(['banana'] * 57 + ['apple'] * 54 + [None] * 10, name='fruit')
nullfruit = fruit.isnull()
fruit.loc[nullfruit] = fruit.dropna().sample(nullfruit.sum()).values
nullfruit = fruit.isnull().values
u, c = np.unique(fruit.values[~nullfruit], return_counts=1)

fruit.loc[nullfruit] = np.random.choice(u, nullfruit.sum(), p=c / c.sum())

您能添加样本和期望输出吗?您能添加样本和期望输出吗?