python中的最小均方算法函数

python中的最小均方算法函数,python,jupyter-notebook,Python,Jupyter Notebook,我正在一个项目中实现LMS机器学习算法。 我知道这不是这个数据集的最佳算法,但这正是我所坚持的 我在应用算法本身时遇到了问题,从下面的代码中可以明显看出,我在这方面做得很差。下面的代码为我提供了一个MemoryError,我正在创建的数组需要46.6Gib的空间。 任何帮助都将不胜感激 以下是我目前遇到的错误: MemoryError: Unable to allocate 46.6 GiB for an array with shape (250100, 25010) and data typ

我正在一个项目中实现LMS机器学习算法。 我知道这不是这个数据集的最佳算法,但这正是我所坚持的

我在应用算法本身时遇到了问题,从下面的代码中可以明显看出,我在这方面做得很差。下面的代码为我提供了一个
MemoryError
,我正在创建的数组需要46.6Gib的空间。 任何帮助都将不胜感激

以下是我目前遇到的错误:

MemoryError: Unable to allocate 46.6 GiB for an array with shape (250100, 25010) and data type float64
这是我的密码:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt


def LMS(classes,epochs, rate,cards_train,class_train,cards_test,class_test):
    total_training_entry = len(cards_train)
    total_training_variables = total_training_entry*classes

    weight=np.zeros((total_training_variables,total_training_entry))
    weight=cards_train.reshape(total_training_variables,total_training_entry)
    w = np.zeros((classes, total_training_entry))

    for x in range(epochs):
        for j in range(classes):
            for data, targets in zip(weight, class_train[:, j]):
                sum = np.dot(data, w[j, :].T)
                w[j, :] = w[j, :]+rate*(targets-sum)*data

    for i in range(total_training_variables):
        for j in range(classes):
            np.dot(w[j, :], weight[i, :total_training_entry])

    #    testing
    total_testing_entry = len(cards_test)
    total_testing_variables = total_testing_entry*classes

    weight=np.zeros((total_testing_variables,total_testing_entry))
    
    weight_reshaped = cards_test.reshape(total_testing_entry, total_testing_variables)
    predicted = []

    for i in range(total_testing_entry):
        for j in range(classes):
            result = np.dot(w[j, :], weight_reshaped[i,
                            :total_testing_variables])
            predicted.append(result)
    accuracy_metric(class_test, predicted)

def accuracy_metric(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual)) * 100.0

def colored(r, g, b, text):
    return "\033[38;2;{};{};{}m{} \033[38;2;255;255;255m".format(r, g, b, text)

def main():
    csv_train = pd.read_csv('train.csv',names=['S1','C1','S2','C2','S3','C3','S4','C4','S5','C5','class'])
    csv_test = pd.read_csv('test.csv',names=['S1','C1','S2','C2','S3','C3','S4','C4','S5','C5','class'])

    train_array = csv_train.values
    cards_train = train_array[:,:-1]
    class_train = train_array[:,-1]

    test_array = csv_test.values
    cards_test = test_array[:,:-1]
    class_test = test_array[:,-1]

    class_train_list = list(class_train)
    class_test_list = list(class_test)
    classes = 10 # 0 -> 9
    
    # print(colored(255, 0, 0, "Training Data:"))
    # plt.bar(list(range(classes)), [class_train_list.count(int(x)) for x in range(classes)])
    # plt.show()

    # print(colored(255, 0, 0, "Testing Data:"))
    # plt.bar(list(range(classes)), [class_test_list.count(int(x)) for x in range(classes)])
    # plt.show()
    
    
    # print(colored(255, 0, 0, "Combined Data:"))
    # plt.bar(list(range(classes)), [class_test_list.count(int(x)) for x in range(classes)],color=("red"))
    # plt.bar(list(range(classes)), [class_train_list.count(int(x)) for x in range(classes)])
    # plt.show()

    rate = 0.02
    epochs = 500

    LMS(classes,epochs, rate,cards_train,class_train,cards_test,class_test)

main()
编辑:

---------------------------------------------------------------------------
MemoryError回溯(上次最近调用)
在里面
83 LMS(等级、时代、费率、卡片培训、等级培训、卡片培训、等级培训、卡片培训、等级培训)
84
--->85 main()
大体上
81个时代=500
82
--->83 LMS(等级、时代、费率、卡片培训、等级培训、卡片培训、等级培训、卡片培训、等级培训)
84
85 main()
在LMS中(等级、时代、费率、卡片培训、等级培训、卡片培训、等级培训、卡片培训、等级培训)
8总培训变量=总培训项目*课程
9
--->10权重=np.0((总训练变量、总训练条目))
11重量=卡片训练。重塑(总训练变量、总训练条目)
12 w=np.零((课程,总培训项目))
MemoryError:无法为具有形状(25010025010)和数据类型float64的数组分配46.6 GiB

您能提供完整的堆栈跟踪以及代码的相关部分吗?@N.Wouda您能检查编辑PLZ以减少数据使用量吗?在没有更好的硬件的情况下,
~250K行
~25K列
的训练数据很大。我使用'csv\u train=pd.read\u csv('train.csv',names=['S1','C1','S2','C2','S3','C3','S4','C4','S5','class',nrows=250)csv\u test pd.read csv('test names=[S1 C1','C1','C1','C3','C3','C3','C3','C3','C3','C3','C4','S5'=['S1','C1','S2','C2','S3','C3','S4','C4','S5','C5','class'],nrows=10000)`但是我在LMS函数中仍然得到了更多的错误,我有一些不匹配的数据或一些代码的错误实现。
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-26-381dfe91a612> in <module>
     83     LMS(classes,epochs, rate,cards_train,class_train,cards_test,class_test)
     84 
---> 85 main()

<ipython-input-26-381dfe91a612> in main()
     81     epochs = 500
     82 
---> 83     LMS(classes,epochs, rate,cards_train,class_train,cards_test,class_test)
     84 
     85 main()

<ipython-input-26-381dfe91a612> in LMS(classes, epochs, rate, cards_train, class_train, cards_test, class_test)
      8     total_training_variables = total_training_entry*classes
      9 
---> 10     weight=np.zeros((total_training_variables,total_training_entry))
     11     weight=cards_train.reshape(total_training_variables,total_training_entry)
     12     w = np.zeros((classes, total_training_entry))

MemoryError: Unable to allocate 46.6 GiB for an array with shape (250100, 25010) and data type float64