Python 随机状态'；s对准确性的贡献_Python_Machine Learning_Scikit Learn_Data Science

Python 随机状态'；s对准确性的贡献

python machine-learning scikit-learn

Python 随机状态'；s对准确性的贡献,python,machine-learning,scikit-learn,data-science,Python,Machine Learning,Scikit Learn,Data Science,好吧，这很有趣。。我执行了几次相同的代码，每次都得到了不同的准确度\u分数。我认为在列车测试拆分时，我没有使用任何随机状态值。所以我使用了random\u state=0，得到了一致的准确度\u得分，为82%。但是然后我想用不同的random\u statenumber来尝试一下，我设置了random\u state=128，并且准确度得分变成了84%。现在我需要理解为什么会这样，以及随机状态如何影响模型的准确性。产出如下： 1>无随机状态： runfile('C:/Users/spa

好吧，这很有趣。。我执行了几次相同的代码，每次都得到了不同的

准确度\u分数

。我认为在

列车测试拆分时，我没有使用任何随机状态
值。所以我使用了random\u state=0
，得到了一致的准确度\u得分，为82%。但是
然后我想用不同的random\u state
number来尝试一下，我设置了random\u state=128
，并且准确度得分变成了84%。
现在我需要理解为什么会这样，以及随机状态如何影响模型的准确性。
产出如下：
1>无随机状态：
runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[90 22]
 [21 46]]
0.7597765363128491

runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[104  16]
 [ 14  45]]
0.8324022346368715

runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[90 18]
 [12 59]]
0.8324022346368715

runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[99  9]
 [19 52]]
0.8435754189944135

2>随机状态=128（准确度得分=84%）
3>随机状态=0（准确度得分=82%）
基本上，random_state
将确保您的代码每次都输出相同的结果，每次都执行相同的精确数据分割。这对于您的初始训练/测试分割以及创建其他人可以精确复制的代码非常有用
将数据拆分为相同或不同
首先要了解的是，如果不使用random\u state
，那么每次数据分割都会不同，这意味着训练集和测试集将不同。这可能不会造成很大的不同，但会导致模型参数/精度等略有变化。如果您每次都将random\u state
设置为相同的值，如random\u state=0
，则每次都将以相同的方式拆分数据
每个随机_状态都会导致不同的分割
要理解的第二件事是，每个随机状态
值将导致不同的拆分和不同的行为。因此，如果希望能够复制结果，则需要将random_state
保持为相同的值
您的模型可以有多个随机状态块
要理解的第三件事是，模型的多个部分可能具有随机性。例如，您的train\u test\u split
可以接受random\u状态
，但是random\u分类器
也可以接受。因此，为了每次都获得完全相同的结果，您需要为模型中具有随机性的每个部分设置random\u state

结论
如果使用random_state
进行初始训练/测试分割，则需要将其设置一次，并继续使用该分割，以避免过度拟合测试集
一般来说，您可以使用交叉验证来评估模型的准确性，而不必太担心随机状态

一个非常重要的注意事项是，您不应该使用随机状态来提高模型的准确性。根据定义，这将导致模型过度拟合数据，而不是泛化为看不见的数据。
请不要发布代码、数据或回溯的图像。复制并粘贴为文本，然后将其格式化为代码（选择它并键入ctrl-k）。。。谢谢，我编辑过了。
runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[106  13]
 [ 15  45]]
0.8435754189944135

runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[106  13]
 [ 15  45]]
0.8435754189944135

runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[93 17]
 [15 54]]
0.8212290502793296

runfile('C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic/Colab File.py', wdir='C:/Users/spark/OneDrive/Documents/Machine Learing/Datasets/Titanic')

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.

[[93 17]
 [15 54]]
0.8212290502793296