python-pandas-dataframe-data填充多维统计信息

python-pandas-dataframe-data填充多维统计信息,pandas,machine-learning,statistics,padding,least-squares,Pandas,Machine Learning,Statistics,Padding,Least Squares,我有一个数据框架,其中列用于说明恒星的不同特征,行用于说明不同恒星的测量。(类似这样的) \属性uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu star1 star2 star3 在某些测量中,特定属性的误差为-1.00,

我有一个数据框架,其中列用于说明恒星的不同特征,行用于说明不同恒星的测量。(类似这样的)

\属性uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu

star1

star2

star3

在某些测量中,特定属性的误差为-1.00,这意味着测量有误

在这种情况下,我想放弃测量

一种方法是删除整行(以及错误不是-1.00的其他属性)

我认为可以使用基于所有其他度量的分布生成的值来填充错误度量,也就是说,给定其他良好的属性,该属性应该具有该值,以减少整个数据集的错误

我指的这个想法有一个合适的名字吗? 您将如何应用这种算法

我是一个单独项目的学生,所以我非常希望能给出详细的理论答案(:

编辑

进一步阅读后,我认为我所指的是回归插补

所以我想我的问题是-我如何以最有效的方式在数据框架中实现多维线性回归


谢谢!

这项技术被称为多重插补()。这个问题更适合交叉验证。如果你问这个问题,你就不能自己动手了: