Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将数据帧的某个百分比设置为NaN_Python_Pandas_Dataframe - Fatal编程技术网

Python 将数据帧的某个百分比设置为NaN

Python 将数据帧的某个百分比设置为NaN,python,pandas,dataframe,Python,Pandas,Dataframe,我想随机设置等于NaN的数据帧值,以获得一定百分比的NaN。 因此,从以下数据帧: name IS_030_EBITDA IS_09_PostTaxResult 0 EISMA MEDIA GROEP B.V. NaN 1292.0 1 EISMA MEDIA GROEP B.V. 2280.0 1324.0 2 DUNLOP B.V. 4

我想随机设置等于NaN的数据帧值,以获得一定百分比的NaN。 因此,从以下数据帧:

     name                       IS_030_EBITDA  IS_09_PostTaxResult
0    EISMA MEDIA GROEP B.V.     NaN            1292.0
1    EISMA MEDIA GROEP B.V.     2280.0         1324.0
2    DUNLOP B.V.                43433.0        1243392.0
3    DUNLOP B.V.                2243480.0      1324.0
我希望我的数据帧有25%的值等于NaN(下面的NaN只是一个示例,必须随机完成):

所以需要了解的是,我不希望将25%的行或列设置为NaN,我希望在最终的数据帧中有25%的值等于NaN


谢谢你的帮助。

你想这样做吗?:

# modified the data to make it read_clipboard friendly
'''
    name    IS_030_EBITDA   IS_09_PostTaxResult
0    EISMA_MEDIA_GROEP_B.V. NaN 1292.0
1    EISMA_MEDIA_GROEP_B.V. 2280.0  1324.0
2    DUNLOP_B.V.    43433.0 1243392.0
3    DUNLOP_B.V.    2243480.0   1324.0
'''

df = pd.read_clipboard()

print(df)

df_sample=df.sample(2) # refer to the 'Note' section below
df_sample[['IS_09_PostTaxResult', 'IS_030_EBITDA']]='NaN'
df.update(df_sample)

print(df)

df原件:

                     name  IS_030_EBITDA  IS_09_PostTaxResult
0  EISMA_MEDIA_GROEP_B.V.            NaN               1292.0
1  EISMA_MEDIA_GROEP_B.V.         2280.0               1324.0
2             DUNLOP_B.V.        43433.0            1243392.0
3             DUNLOP_B.V.      2243480.0               1324.0
df修改:

                     name IS_030_EBITDA IS_09_PostTaxResult
0  EISMA_MEDIA_GROEP_B.V.           NaN                 NaN
1  EISMA_MEDIA_GROEP_B.V.          2280                1324
2             DUNLOP_B.V.         43433         1.24339e+06
3             DUNLOP_B.V.           NaN                 NaN
注:

“df_sample=df.sample(2)”->您可以添加一个逻辑来选择总样本记录的25%,并替换值2。例如:

# 25% data in each column 
x=25.0
factor = int((len(df)*x)/100) # factor=1 in the example above

df_sample=df.sample(factor)

如果我理解正确,您希望均匀地选择25%的单元格。这意味着您不能先选择记录(这会扭曲分布)。以下解决方案适用于25%的电池:

df = pd.DataFrame({"a": range(10), "b": range(10, 20)})
total_cells = df.shape[0] * df.shape[1]

df = df.reset_index().melt(id_vars = "index")
df.loc[np.random.randint(0, total_cells, int(total_cells * .25)), "value"] = np.NaN
df.pivot(index = "index", columns = "variable")
结果:

         value      
variable     a     b
index               
0          0.0  10.0
1          1.0  11.0
2          2.0   NaN
3          NaN   NaN
4          4.0  14.0
5          5.0  15.0
6          6.0  16.0
7          7.0   NaN
8          8.0   NaN
9          9.0  19.0
         value      
variable     a     b
index               
0          0.0  10.0
1          1.0  11.0
2          2.0   NaN
3          NaN   NaN
4          4.0  14.0
5          5.0  15.0
6          6.0  16.0
7          7.0   NaN
8          8.0   NaN
9          9.0  19.0