Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/142.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C++ 我的神经网络不是';不要学习正确的答案_C++_Neural Network_Minimax_Temporal Difference - Fatal编程技术网

C++ 我的神经网络不是';不要学习正确的答案

C++ 我的神经网络不是';不要学习正确的答案,c++,neural-network,minimax,temporal-difference,C++,Neural Network,Minimax,Temporal Difference,首先,我是一个完全的业余爱好者,所以我可能会混淆一些术语 我一直在研究一个神经网络来连续播放Connect 4/4 目前设计的网络模型有170个输入值、417个隐藏神经元和1个输出神经元。网络是完全连接的,即每个输入连接到每个隐藏神经元,每个隐藏神经元连接到输出节点 每个连接都有一个独立的权重,每个隐藏节点和单个输出节点都有一个带权重的附加偏置节点 Connect 4游戏状态170个值的输入表示为: 42对值(84个输入变量),表示空间是被播放器1、播放器2占用还是空闲。 0,0表示它是免费

首先,我是一个完全的业余爱好者,所以我可能会混淆一些术语

我一直在研究一个神经网络来连续播放Connect 4/4

目前设计的网络模型有170个输入值、417个隐藏神经元和1个输出神经元。网络是完全连接的,即每个输入连接到每个隐藏神经元,每个隐藏神经元连接到输出节点

每个连接都有一个独立的权重,每个隐藏节点和单个输出节点都有一个带权重的附加偏置节点

Connect 4游戏状态170个值的输入表示为:

  • 42对值(84个输入变量),表示空间是被播放器1、播放器2占用还是空闲。
    • 0,0
      表示它是免费的
    • 1,0
      表示玩家1的位置
    • 0,1
      表示玩家2的位置
    • 1,1
      不可能
  • 另外42对值(84个输入变量)表示在此处添加一个片段是否会给玩家1或玩家2一个“连接4”/“四个一行”。值的组合与上述含义相同
  • 2个表示轮到谁的最终输入变量:
    • 1,0
      1号玩家的回合
    • 0,1
      2号玩家的回合
    • 1,1
      0,0
      是不可能的
我测量了100个游戏在10000个不同配置的游戏中的平均均方误差,得出:

  • 417隐藏神经元
  • Alpha和Beta学习率在开始时为0.1,在整个历代中线性下降至0.01
  • λ值为0.5
  • 100个动作中有90个在开始时是随机的,在前50%的时间后,每100个动作中有10个会下降。因此,在中间点,100次移动中有10次是随机的
  • 前50%的纪元是从随机移动开始的
  • 每个节点使用的Sigmoid激活函数
此图显示了以对数比例绘制的各种配置的结果。这就是我决定使用哪种配置的方式

我通过将处于赢状态的棋盘输出与玩家2赢的
-1
和玩家1赢的
1
进行比较来计算该均方误差。我每100场游戏将这些值相加,然后将总数除以100,得到1000个值,然后在上图中绘制。即,代码段是:

if(board.InARowConnected(4) == Board<7,6,4>::Player1)
{
    totalLoss += NN->BackPropagateFinal({1},previousNN,alpha,beta,lambda);
    winState = true;
}
else if(board.InARowConnected(4) == Board<7,6,4>::Player2)
{
    totalLoss += NN->BackPropagateFinal({-1},previousNN,alpha,beta,lambda);
    winState = true;
}
else if(!board.IsThereAvailableMove())
{
    totalLoss += NN->BackPropagateFinal({0},previousNN,alpha,beta,lambda);
    winState = true;
}

...

if(gameNumber % 100 == 0 && gameNumber != 0)
{
    totalLoss = totalLoss / gamesToOutput;
    matchFile << std::fixed << std::setprecision(51) << totalLoss << std::endl;
    totalLoss = 0.0;
}
我的神经元课

template<std::size_t NumInputs>
class Neuron
{
public:
    Neuron()
    {
        for(auto& i : m_inputValues)
            i = 9;
        for(auto& e : m_eligibilityTraces)
            e = 9;
        for(auto& w : m_weights)
            w = 9;
        m_biasWeight = 9;
        m_biasEligibilityTrace = 9;
        m_outputValue = 9;
    }

    void SetInputValue(const std::size_t index, const double value)
    {
        m_inputValues[index] = value;
    }

    void SetWeight(const std::size_t index, const double weight)
    {
        if(std::isnan(weight))
            throw std::runtime_error("Shit! this is a nan bread");
        m_weights[index] = weight;
    }

    void SetBiasWeight(const double weight)
    {
        m_biasWeight = weight;
    }

    double GetInputValue(const std::size_t index) const
    {
        return m_inputValues[index];
    }

    double GetWeight(const std::size_t index) const
    {
        return m_weights[index];
    }

    double GetBiasWeight() const
    {
        return m_biasWeight;
    }

    double CalculateOutput()
    {
        m_outputValue = 0;
        for(std::size_t i = 0; i < NumInputs; ++i)
        {
            m_outputValue += m_inputValues[i] * m_weights[i];
        }
        m_outputValue += 1.0 * m_biasWeight;
        m_outputValue = sigmoid(m_outputValue);
        return m_outputValue;
    }

    double GetOutput() const
    {
        return m_outputValue;
    }

    double GetEligibilityTrace(const std::size_t index) const
    {
        return m_eligibilityTraces[index];
    }

    void SetEligibilityTrace(const std::size_t index, const double eligibility)
    {
        m_eligibilityTraces[index] = eligibility;
    }

    void SetBiasEligibility(const double eligibility)
    {
        m_biasEligibilityTrace = eligibility;
    }

    double GetBiasEligibility() const
    {
        return m_biasEligibilityTrace;
    }

    void ResetEligibilityTraces()
    {
        for(auto& e : m_eligibilityTraces)
            e = 0;
        m_biasEligibilityTrace = 0;
    }

private:
    std::array<double,NumInputs> m_inputValues;
    std::array<double,NumInputs> m_weights;
    std::array<double,NumInputs> m_eligibilityTraces;
    double m_biasWeight;
    double m_biasEligibilityTrace;
    double m_outputValue;
};
我认为我可能有一个问题,那就是选择最佳移动的极小极大值

还有一些我认为与我所面临的问题不太相关的补充内容

问题

  • 不管我是训练1000场比赛还是300万场比赛,球员1或球员2都将赢得绝大多数比赛。一个玩家赢了100场比赛中的90场。如果我输出实际的单个游戏移动和输出,我可以看到其他玩家赢得的游戏几乎总是幸运随机移动的结果

    同时,我注意到预测输出了某种程度上的对玩家的“青睐”。也就是说,输出似乎处于
    0
    的负面,因此玩家1总是尽其所能做出最好的动作,例如,但它们似乎都是预测玩家2获胜的

    有时是玩家1赢得多数,有时是玩家2。我假设这是由于随机权重初始化 轻视一名球员

    第一场比赛并没有让一个球员胜于另一个球员,但它很快开始向一个方向“倾斜”

  • 我现在已经尝试训练了300多万场比赛,花了3天时间,但网络似乎仍然不能做出正确的决定。我已经测试了网络,让它在riddles.io Connect 4 comp上播放其他“机器人”

    • 它没有意识到需要连续4次阻挡对手
    • 即使在300万场比赛之后,它也不会把中柱作为第一步,我们知道这是你唯一能保证胜利的出发点

  • 任何帮助和指导都将不胜感激。具体来说,我的TD Lambda反向传播实现正确吗?

    在深入研究代码之前,我建议您先在较小的问题上调试神经网络代码,然后再扩展到如此大的示例。根据标记点将二维空间划分为N个(首先是2个)部分是一个很好的第一步。@frank哪种问题适合尝试更小的规模?有很好的2D空间划分的例子。我在几年后发现了这一点,但知道这一点可能会有所帮助,这意味着无论你的AI如何工作,获胜可能会让玩家1胜于玩家2。哈哈,谢谢@chipster
    inline double sigmoid(const double x)
    {
        //  return 1.0 / (1.0 + std::exp(-x));
        return x / (1.0 + std::abs(x));
    }
    
    template<std::size_t NumInputs>
    class Neuron
    {
    public:
        Neuron()
        {
            for(auto& i : m_inputValues)
                i = 9;
            for(auto& e : m_eligibilityTraces)
                e = 9;
            for(auto& w : m_weights)
                w = 9;
            m_biasWeight = 9;
            m_biasEligibilityTrace = 9;
            m_outputValue = 9;
        }
    
        void SetInputValue(const std::size_t index, const double value)
        {
            m_inputValues[index] = value;
        }
    
        void SetWeight(const std::size_t index, const double weight)
        {
            if(std::isnan(weight))
                throw std::runtime_error("Shit! this is a nan bread");
            m_weights[index] = weight;
        }
    
        void SetBiasWeight(const double weight)
        {
            m_biasWeight = weight;
        }
    
        double GetInputValue(const std::size_t index) const
        {
            return m_inputValues[index];
        }
    
        double GetWeight(const std::size_t index) const
        {
            return m_weights[index];
        }
    
        double GetBiasWeight() const
        {
            return m_biasWeight;
        }
    
        double CalculateOutput()
        {
            m_outputValue = 0;
            for(std::size_t i = 0; i < NumInputs; ++i)
            {
                m_outputValue += m_inputValues[i] * m_weights[i];
            }
            m_outputValue += 1.0 * m_biasWeight;
            m_outputValue = sigmoid(m_outputValue);
            return m_outputValue;
        }
    
        double GetOutput() const
        {
            return m_outputValue;
        }
    
        double GetEligibilityTrace(const std::size_t index) const
        {
            return m_eligibilityTraces[index];
        }
    
        void SetEligibilityTrace(const std::size_t index, const double eligibility)
        {
            m_eligibilityTraces[index] = eligibility;
        }
    
        void SetBiasEligibility(const double eligibility)
        {
            m_biasEligibilityTrace = eligibility;
        }
    
        double GetBiasEligibility() const
        {
            return m_biasEligibilityTrace;
        }
    
        void ResetEligibilityTraces()
        {
            for(auto& e : m_eligibilityTraces)
                e = 0;
            m_biasEligibilityTrace = 0;
        }
    
    private:
        std::array<double,NumInputs> m_inputValues;
        std::array<double,NumInputs> m_weights;
        std::array<double,NumInputs> m_eligibilityTraces;
        double m_biasWeight;
        double m_biasEligibilityTrace;
        double m_outputValue;
    };
    
    void RandomiseWeights()
    {
        double inputToHiddenRange = 4.0 * std::sqrt(6.0 / (NumInputs+1+NumOutputs));
        RandomGenerator inputToHidden(-inputToHiddenRange,inputToHiddenRange);
    
        double hiddenToOutputRange = 4.0 * std::sqrt(6.0 / (NumHidden+1+1));
        RandomGenerator hiddenToOutput(-hiddenToOutputRange,hiddenToOutputRange);
    
        for(auto& hiddenNeuron : m_hiddenNeurons)
        {
            for(std::size_t i = 0; i < NumInputs; ++i)
                hiddenNeuron.SetWeight(i, inputToHidden());
            hiddenNeuron.SetBiasWeight(inputToHidden());
        }
    
        for(auto& outputNeuron : m_outputNeurons)
        {
            for(std::size_t h = 0; h < NumHidden; ++h)
                outputNeuron.SetWeight(h, hiddenToOutput());
            outputNeuron.SetBiasWeight(hiddenToOutput());
        }
    }
    
    double GetOutput(const std::size_t index) const
    {
        return m_outputNeurons[index].GetOutput();
    }
    
    std::array<double,NumOutputs> GetOutputs()
    {
        std::array<double, NumOutputs> returnValue;
        for(std::size_t o = 0; o < NumOutputs; ++o)
            returnValue[o] = m_outputNeurons[o].GetOutput();
        return returnValue;
    }
    
    void SetInputValue(const std::size_t index, const double value)
    {
        for(auto& hiddenNeuron : m_hiddenNeurons)
            hiddenNeuron.SetInputValue(index, value);
    }
    
    std::array<double,NumOutputs> Calculate()
    {
        for(auto& h : m_hiddenNeurons)
            h.CalculateOutput();
        for(auto& o : m_outputNeurons)
            o.CalculateOutput();
    
        return GetOutputs();
    }
    
    std::array<double,NumOutputs> FeedForward(const std::array<double,NumInputs>& inputValues)
    {
        for(std::size_t h = 0; h < NumHidden; ++h)//auto& hiddenNeuron : m_hiddenNeurons)
        {
            for(std::size_t i = 0; i < NumInputs; ++i)
                m_hiddenNeurons[h].SetInputValue(i,inputValues[i]);
    
            m_hiddenNeurons[h].CalculateOutput();
        }
    
        std::array<double, NumOutputs> returnValue;
    
        for(std::size_t h = 0; h < NumHidden; ++h)
        {
            auto hiddenOutput = m_hiddenNeurons[h].GetOutput();
            for(std::size_t o = 0; o < NumOutputs; ++o)
                m_outputNeurons[o].SetInputValue(h, hiddenOutput);
        }
    
        for(std::size_t o = 0; o < NumOutputs; ++o)
        {
            returnValue[o] = m_outputNeurons[o].CalculateOutput();
        }
    
        return returnValue;
    }
    
    double BackPropagateFinal(const std::array<double,NumOutputs>& actualValues, const NeuralNetwork<NumInputs,NumHidden,NumOutputs>* NN, const double alpha, const double beta, const double lambda)
    {
        for(std::size_t iO = 0; iO < NumOutputs; ++iO)
        {
            auto y = NN->m_outputNeurons[iO].GetOutput();
            auto y1 = actualValues[iO];
    
            for(std::size_t iH = 0; iH < NumHidden; ++iH)
            {
                auto e = NN->m_outputNeurons[iO].GetEligibilityTrace(iH);
                auto h = NN->m_hiddenNeurons[iH].GetOutput();
                auto w = NN->m_outputNeurons[iO].GetWeight(iH);
    
                double e1 = lambda * e + (y * (1.0 - y) * h);
    
                double w1 = w + beta * (y1 - y) * e1;
    
                m_outputNeurons[iO].SetEligibilityTrace(iH,e1);
                m_outputNeurons[iO].SetWeight(iH,w1);
            }
    
            auto e = NN->m_outputNeurons[iO].GetBiasEligibility();
            auto h = 1.0;
            auto w = NN->m_outputNeurons[iO].GetBiasWeight();
    
            double e1 = lambda * e + (y * (1.0 - y) * h);
    
            double w1 = w + beta * (y1 - y) * e1;
    
            m_outputNeurons[iO].SetBiasEligibility(e1);
            m_outputNeurons[iO].SetBiasWeight(w1);
        }
    
        for(std::size_t iH = 0; iH < NumHidden; ++iH)
        {
            auto h = NN->m_hiddenNeurons[iH].GetOutput();
    
            for(std::size_t iI = 0; iI < NumInputs; ++iI)
            {
                auto e = NN->m_hiddenNeurons[iH].GetEligibilityTrace(iI);
                auto x = NN->m_hiddenNeurons[iH].GetInputValue(iI);
                auto u = NN->m_hiddenNeurons[iH].GetWeight(iI);
    
                double sumError = 0;
    
                for(std::size_t iO = 0; iO < NumOutputs; ++iO)
                {
                    auto w = NN->m_outputNeurons[iO].GetWeight(iH);
                    auto y = NN->m_outputNeurons[iO].GetOutput();
                    auto y1 = actualValues[iO];
    
                    auto grad = y1 - y;
    
                    double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);
    
                    sumError += grad * e1;
                }
    
                double u1 = u + alpha * sumError;
    
                m_hiddenNeurons[iH].SetEligibilityTrace(iI,sumError);
                m_hiddenNeurons[iH].SetWeight(iI,u1);
            }
    
            auto e = NN->m_hiddenNeurons[iH].GetBiasEligibility();
            auto x = 1.0;
            auto u = NN->m_hiddenNeurons[iH].GetBiasWeight();
    
            double sumError = 0;
    
            for(std::size_t iO = 0; iO < NumOutputs; ++iO)
            {
                auto w = NN->m_outputNeurons[iO].GetWeight(iH);
                auto y = NN->m_outputNeurons[iO].GetOutput();
                auto y1 = actualValues[iO];
    
                auto grad = y1 - y;
    
                double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);
    
                sumError += grad * e1;
            }
    
            double u1 = u + alpha * sumError;
    
            m_hiddenNeurons[iH].SetBiasEligibility(sumError);
            m_hiddenNeurons[iH].SetBiasWeight(u1);
        }
    
        double retVal = 0;
        for(std::size_t o = 0; o < NumOutputs; ++o)
        {
            retVal += 0.5 * alpha * std::pow((NN->GetOutput(o) - GetOutput(0)),2);
        }
        return retVal / NumOutputs;
    }
    
    double BackPropagate(const NeuralNetwork<NumInputs,NumHidden,NumOutputs>* NN, const double alpha, const double beta, const double lambda)
    {
        for(std::size_t iO = 0; iO < NumOutputs; ++iO)
        {
            auto y = NN->m_outputNeurons[iO].GetOutput();
            auto y1 = m_outputNeurons[iO].GetOutput();
    
            for(std::size_t iH = 0; iH < NumHidden; ++iH)
            {
                auto e = NN->m_outputNeurons[iO].GetEligibilityTrace(iH);
                auto h = NN->m_hiddenNeurons[iH].GetOutput();
                auto w = NN->m_outputNeurons[iO].GetWeight(iH);
    
                double e1 = lambda * e + (y * (1.0 - y) * h);
    
                double w1 = w + beta * (y1 - y) * e1;
    
                m_outputNeurons[iO].SetEligibilityTrace(iH,e1);
    
                m_outputNeurons[iO].SetWeight(iH,w1);
            }
    
            auto e = NN->m_outputNeurons[iO].GetBiasEligibility();
            auto h = 1.0;
            auto w = NN->m_outputNeurons[iO].GetBiasWeight();
    
            double e1 = lambda * e + (y * (1.0 - y) * h);
    
            double w1 = w + beta * (y1 - y) * e1;
    
            m_outputNeurons[iO].SetBiasEligibility(e1);
            m_outputNeurons[iO].SetBiasWeight(w1);
        }
    
        for(std::size_t iH = 0; iH < NumHidden; ++iH)
        {
            auto h = NN->m_hiddenNeurons[iH].GetOutput();
    
            for(std::size_t iI = 0; iI < NumInputs; ++iI)
            {
                auto e = NN->m_hiddenNeurons[iH].GetEligibilityTrace(iI);
                auto x = NN->m_hiddenNeurons[iH].GetInputValue(iI);
                auto u = NN->m_hiddenNeurons[iH].GetWeight(iI);
    
                double sumError = 0;
    
                for(std::size_t iO = 0; iO < NumOutputs; ++iO)
                {
                    auto w = NN->m_outputNeurons[iO].GetWeight(iH);
                    auto y = NN->m_outputNeurons[iO].GetOutput();
                    auto y1 = m_outputNeurons[iO].GetOutput();
    
                    auto grad = y1 - y;
    
                    double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);
    
                    sumError += grad * e1;
                }
    
                double u1 = u + alpha * sumError;
    
                m_hiddenNeurons[iH].SetEligibilityTrace(iI,sumError);
    
                m_hiddenNeurons[iH].SetWeight(iI,u1);
            }
    
            auto e = NN->m_hiddenNeurons[iH].GetBiasEligibility();
            auto x = 1.0;
            auto u = NN->m_hiddenNeurons[iH].GetBiasWeight();
    
            double sumError = 0;
    
            for(std::size_t iO = 0; iO < NumOutputs; ++iO)
            {
                auto w = NN->m_outputNeurons[iO].GetWeight(iH);
                auto y = NN->m_outputNeurons[iO].GetOutput();
                auto y1 = m_outputNeurons[iO].GetOutput();
    
                auto grad = y1 - y;
    
                double e1 = lambda * e + (y * (1.0 - y) * w * h * (1.0 - h) * x);
    
                sumError += grad * e1;
            }
    
            double u1 = u + alpha * sumError;
    
            m_hiddenNeurons[iH].SetBiasEligibility(sumError);
            m_hiddenNeurons[iH].SetBiasWeight(u1);
        }
    
        double retVal = 0;
        for(std::size_t o = 0; o < NumOutputs; ++o)
        {
            retVal += 0.5 * alpha * std::pow((NN->GetOutput(o) - GetOutput(0)),2);
        }
        return retVal / NumOutputs;
    }
    
    std::array<double,NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs> GetNetworkWeights() const
    {
        std::array<double,NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs> returnVal;
    
        std::size_t weightPos = 0;
    
        for(std::size_t h = 0; h < NumHidden; ++h)
        {
            for(std::size_t i = 0; i < NumInputs; ++i)
                returnVal[weightPos++] = m_hiddenNeurons[h].GetWeight(i);
            returnVal[weightPos++] = m_hiddenNeurons[h].GetBiasWeight();
        }
        for(std::size_t o = 0; o < NumOutputs; ++o)
        {
            for(std::size_t h = 0; h < NumHidden; ++h)
                returnVal[weightPos++] = m_outputNeurons[o].GetWeight(h);
            returnVal[weightPos++] = m_outputNeurons[o].GetBiasWeight();
        }
    
        return returnVal;
    }
    
    static constexpr std::size_t NumWeights = NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs;
    
    
    void SetNetworkWeights(const std::array<double,NumInputs*NumHidden+NumHidden+NumHidden*NumOutputs+NumOutputs>& weights)
    {
        std::size_t weightPos = 0;
        for(std::size_t h = 0; h < NumHidden; ++h)
        {
            for(std::size_t i = 0; i < NumInputs; ++i)
                m_hiddenNeurons[h].SetWeight(i, weights[weightPos++]);
            m_hiddenNeurons[h].SetBiasWeight(weights[weightPos++]);
        }
        for(std::size_t o = 0; o < NumOutputs; ++o)
        {
            for(std::size_t h = 0; h < NumHidden; ++h)
                m_outputNeurons[o].SetWeight(h, weights[weightPos++]);
            m_outputNeurons[o].SetBiasWeight(weights[weightPos++]);
        }
    }
    
    void ResetEligibilityTraces()
    {
        for(auto& h : m_hiddenNeurons)
            h.ResetEligibilityTraces();
        for(auto& o : m_outputNeurons)
            o.ResetEligibilityTraces();
    }
    
    private:
    
    std::array<Neuron<NumInputs>,NumHidden> m_hiddenNeurons;
    std::array<Neuron<NumHidden>,NumOutputs> m_outputNeurons;
    };
    
    int main()
    {
        std::ofstream matchFile("match.txt");
    
        RandomGenerator randomPlayerStart(0,1);
        RandomGenerator randomMove(0,100);
    
        Board<7,6,4> board;
    
        auto NN = new NeuralNetwork<7*6*4+2,417,1>();
        auto previousNN = new NeuralNetwork<7*6*4+2,417,1>();
        NN->RandomiseWeights();
    
        const int numGames = 3000000;
        double alpha = 0.1;
        double beta = 0.1;
        double lambda = 0.5;
        double learningRateFloor = 0.01;
        double decayRateAlpha = (alpha - learningRateFloor) / numGames;
        double decayRateBeta = (beta - learningRateFloor) / numGames;
        double randomChance = 90; // out of 100
        double randomChangeFloor = 10;
        double percentToReduceRandomOver = 0.5;
        double randomChangeDecay = (randomChance-randomChangeFloor) / (numGames*percentToReduceRandomOver);
        double percentOfGamesToRandomiseStart = 0.5;
    
        int numGamesWonP1 = 0;
        int numGamesWonP2 = 0;
    
        int gamesToOutput = 100;
    
        matchFile << "Num Games: " << numGames << "\t\ta,b,l: " << alpha << ", " << beta << ", " << lambda << std::endl;
    
        Board<7,6,4>::Player playerStart = randomPlayerStart() > 0.5 ? Board<7,6,4>::Player1 : Board<7,6,4>::Player2;
    
        double totalLoss = 0.0;
    
        for(int gameNumber = 0; gameNumber < numGames; ++gameNumber)
        {
            bool winState = false;
            Board<7,6,4>::Player playerWhoTurnItIs = playerStart;
            playerStart = playerStart == Board<7,6,4>::Player1 ? Board<7,6,4>::Player2 : Board<7,6,4>::Player1;
            board.ClearBoard();
    
            int turnNumber = 0;
    
            while(!winState)
            {
                Board<7,6,4>::Player playerWhoTurnItIsNot = playerWhoTurnItIs == Board<7,6,4>::Player1 ? Board<7,6,4>::Player2 : Board<7,6,4>::Player1;
    
                bool wasRandomMove = false;
    
                std::size_t selectedMove;
                bool moveFound = false;
    
                if(board.IsThereAvailableMove())
                {
                    std::vector<std::size_t> availableMoves;
                    if((gameNumber <= numGames * percentOfGamesToRandomiseStart && turnNumber == 0) || randomMove() > 100.0-randomChance)
                        wasRandomMove = true;
    
                    std::size_t bestMove = 8;
                    double bestWorstResponse = playerWhoTurnItIs == Board<7,6,4>::Player1 ? std::numeric_limits<double>::min() : std::numeric_limits<double>::max();
    
                    for(std::size_t m = 0; m < 7; ++m)
                    {
                        Board<7,6,4> testBoard = board;    // make a copy of the current board to run our tests
                        if(testBoard.AvailableMoveInColumn(m))
                        {
                            if(wasRandomMove)
                            {
                                availableMoves.push_back(m);
                            }
                            testBoard.AddChecker(m, playerWhoTurnItIs);
    
                            double worstResponse = playerWhoTurnItIs == Board<7,6,4>::Player1 ? std::numeric_limits<double>::max() : std::numeric_limits<double>::min();
                            std::size_t worstMove = 8;
    
                            for(std::size_t m2 = 0; m2 < 7; ++m2)
                            {
                                Board<7,6,4> testBoard2 = testBoard;
                                if(testBoard2.AvailableMoveInColumn(m2))
                                {
                                    testBoard2.AddChecker(m,playerWhoTurnItIsNot);
    
                                    StateType state;
                                    create_board_state(state, testBoard2, playerWhoTurnItIs);
                                    auto outputs = NN->FeedForward(state);
    
                                    if(playerWhoTurnItIs == Board<7,6,4>::Player1 && (outputs[0] < worstResponse || worstMove == 8))
                                    {
                                        worstResponse = outputs[0];
                                        worstMove = m2;
                                    }
                                    else if(playerWhoTurnItIs == Board<7,6,4>::Player2 && (outputs[0] > worstResponse || worstMove == 8))
                                    {
                                        worstResponse = outputs[0];
                                        worstMove = m2;
                                    }
                                }
                            }
    
                            if(playerWhoTurnItIs == Board<7,6,4>::Player1 && (worstResponse > bestWorstResponse || bestMove == 8))
                            {
                                bestWorstResponse = worstResponse;
                                bestMove = m;
                            }
                            else if(playerWhoTurnItIs == Board<7,6,4>::Player2 && (worstResponse < bestWorstResponse || bestMove == 8))
                            {
                                bestWorstResponse = worstResponse;
                                bestMove = m;
                            }
                        }
                    }
                    if(bestMove == 8)
                    {
                        std::cerr << "wasn't able to determine the best move to make" << std::endl;
                        return 0;
                    }
                    if(gameNumber <= numGames * percentOfGamesToRandomiseStart && turnNumber == 0)
                    {
                        std::size_t rSelection = int(randomMove()) % (availableMoves.size());
    
                        selectedMove = availableMoves[rSelection];
                        moveFound = true;
                    }
                    else if(wasRandomMove)
                    {
                        std::remove(availableMoves.begin(),availableMoves.end(),bestMove);
                        std::size_t rSelection = int(randomMove()) % (availableMoves.size());
    
                        selectedMove = availableMoves[rSelection];
                        moveFound = true;
                    }
                    else
                    {
                        selectedMove = bestMove;
                        moveFound = true;
                    }
                }
    
                StateType prevState;
                create_board_state(prevState,board,playerWhoTurnItIs);
                NN->FeedForward(prevState);
                *previousNN = *NN;
    
                // now that we have the move, add it to the board
                StateType state;
                board.AddChecker(selectedMove,playerWhoTurnItIs);
                create_board_state(state,board,playerWhoTurnItIsNot);
    
                auto outputs = NN->FeedForward(state);
    
                if(board.InARowConnected(4) == Board<7,6,4>::Player1)
                {
                    totalLoss += NN->BackPropagateFinal({1},previousNN,alpha,beta,lambda);
                    winState = true;
                    ++numGamesWonP1;
                }
                else if(board.InARowConnected(4) == Board<7,6,4>::Player2)
                {
                    totalLoss += NN->BackPropagateFinal({-1},previousNN,alpha,beta,lambda);
                    winState = true;
                    ++numGamesWonP2;
                }
                else if(!board.IsThereAvailableMove())
                {
                    totalLoss += NN->BackPropagateFinal({0},previousNN,alpha,beta,lambda);
                    winState = true;
                }
                else if(turnNumber > 0 && !wasRandomMove)
                {
                    NN->BackPropagate(previousNN,alpha,beta,lambda);
                }
    
                if(!wasRandomMove)
                {
                    outputs = NN->FeedForward(state);
                }
    
                ++turnNumber;
                playerWhoTurnItIs = playerWhoTurnItIsNot;
            }
    
            alpha -= decayRateAlpha;
            beta -= decayRateBeta;
    
            NN->ResetEligibilityTraces();
    
            if(gameNumber > 0 && randomChance > randomChangeFloor && gameNumber <= numGames * percentToReduceRandomOver)
            {
                randomChance -= randomChangeDecay;
                if(randomChance < randomChangeFloor)
                    randomChance = randomChangeFloor;
            }
    
            if(gameNumber % gamesToOutput == 0 && gameNumber != 0)
            {
                totalLoss = totalLoss / gamesToOutput;
                matchFile << std::fixed << std::setprecision(51) << totalLoss << std::endl;
                totalLoss = 0.0;
            }
        }
    
        matchFile << std::endl << "Games won: " << numGamesWonP1 << " . " << numGamesWonP2 << std::endl;
    
        auto weights = NN->GetNetworkWeights();
        matchFile << std::endl;
        matchFile << std::endl;
        for(const auto& w : weights)
            matchFile << std::fixed << std::setprecision(51) << w << ", \n";
        matchFile << std::endl;
    
        return 0;
    }