Python 逐步回归的快速实现_Python_Performance_Algorithm

Python 逐步回归的快速实现

python performance algorithm

Python 逐步回归的快速实现,python,performance,algorithm,Python,Performance,Algorithm,来自维基百科正向选择，包括在模型，使用所选模型测试每个变量的添加比较标准，添加变量（如果有）以改进建模最多，并重复此过程，直到没有一个可以改善模型我认为该算法的实现非常有趣，因为它可以被看作是爬山算法的组合版本，其中邻居函数等效于向当前模型添加变量我没有足够的经验以优化的方式编写这个算法。这是我当前的实现： class FSR(): def __init__(self, n_components): self.n_components = n_compon

来自维基百科

正向选择，包括在模型，使用所选模型测试每个变量的添加比较标准，添加变量（如果有）以改进建模最多，并重复此过程，直到没有一个可以改善模型

我认为该算法的实现非常有趣，因为它可以被看作是爬山算法的组合版本，其中邻居函数等效于向当前模型添加变量

我没有足够的经验以优化的方式编写这个算法。这是我当前的实现：

class FSR():

    def __init__(self, n_components):
        self.n_components = n_components


    def cost(self, index):
        lr = LinearRegression().fit(self.x[:, index], self.y)
        hat_y = lr.predict(self.x[:, index])
        e = np.linalg.norm(hat_y - self.y)
        return  e

    def next_step_fsr(self, comp, cand):
        """ given the current components and candidates the function
        return the new components, the new candidates and the new EV"""

        if comp == []:
            er = np.inf
        else:
            er = self.cost(comp)

        for i in range(len(cand)):
            e = cand.popleft()
            comp.append(e)
            new_er = self.cost(comp)
            if new_er < er:
                new_comp = comp.copy()
                new_cand = deque(i for i in cand)
                er = new_er
            comp.pop()
            cand.append(e)
        return new_comp, new_cand, new_er 

    def fsr(self):
        n, p = self.x.shape
        er = []
        comp = []
        cand = deque(range(p))
        for i in range(self.n_components):
            comp, cand, new_er = self.next_step_fsr(comp, cand)
            er.append(new_er)
        return comp, er

    def fit(self, x, y):
        self.x = x
        self.y = y
        self.comp_, self.er_ = self.fsr()

我希望最终的代码也看起来没有太多的张贴一个不同。这是因为我想把这个问题推广到具有不同代价函数的不同组合问题

我认为应该更改的函数是

next\u step\u fsr

，在给定当前选定变量的情况下，尝试将哪一个变量包括在模型中最好。我特别感兴趣的是x有很多列（比如10000列）的情况。我认为当前的瓶颈是复制候选名单的那一行

x = np.random.normal(0,1, (100,20))
y = x[:,1] + x[:,2] + np.random.normal(0,.1, 100)
fsr = FSR(n_components=2)
fsr.fit(x,y)
print('selected component = ', fsr.comp_)