Python 为什么我的Minimax不能正确扩展和移动？_Python_Python 2.7_Recursion_Artificial Intelligence_Minimax

Python 为什么我的Minimax不能正确扩展和移动？

python python-2.7 recursion artificial-intelligence

Python 为什么我的Minimax不能正确扩展和移动？,python,python-2.7,recursion,artificial-intelligence,minimax,Python,Python 2.7,Recursion,Artificial Intelligence,Minimax,我正在Python2.7.11的一个基本游戏Pacman中实现minimax。Pacman是最大化代理，一个或多个重影（取决于测试布局）是/是最小化代理我必须实现minimax，这样可能会有多个最小化代理，并且它可以创建一个n层的树（深度）。例如，第1层是每个鬼魂轮流最小化其可能移动的终端状态效用，以及pacman轮流最大化鬼魂已经最小化的东西。从图形上看，帘布层1如下所示：如果将以下任意实用程序分配给绿色终端状态（从左到右）： -10,5,8,4，-4,20，-7,17 Pacman应该

我正在Python2.7.11的一个基本游戏Pacman中实现minimax。Pacman是最大化代理，一个或多个重影（取决于测试布局）是/是最小化代理

我必须实现minimax，这样可能会有多个最小化代理，并且它可以创建一个n层的树（深度）。例如，第1层是每个鬼魂轮流最小化其可能移动的终端状态效用，以及pacman轮流最大化鬼魂已经最小化的东西。从图形上看，帘布层1如下所示：

如果将以下任意实用程序分配给绿色终端状态（从左到右）：

-10,5,8,4，-4,20，-7,17
Pacman应该返回
-4
，然后朝这个方向移动，根据这个决定创建一个全新的极大极小树。首先，我的实现需要一系列变量和函数才能发挥作用：

# Stores everything about the current state of the game gameState # A globally defined depth that varies depending on the test cases. # It could be as little as 1 or arbitrarily large self.depth # A locally defined depth that keeps track of how many plies deep I've gone in the tree self.myDepth # A function that assigns a numeric value as a utility for the current state # How this is calculated is moot self.evaluationFunction(gameState) # Returns a list of legal actions for an agent # agentIndex = 0 means Pacman, ghosts are >= 1 gameState.getLegalActions(agentIndex) # Returns the successor game state after an agent takes an action gameState.generateSuccessor(agentIndex, action) # Returns the total number of agents in the game gameState.getNumAgents() # Returns whether or not the game state is a winning (terminal) state gameState.isWin() # Returns whether or not the game state is a losing (terminal) state gameState.isLose()
这是我的实现：

""" getAction takes a gameState and returns the optimal move for pacman, assuming that the ghosts are optimal at minimizing his possibilities """ def getAction(self, gameState): self.myDepth = 0 def miniMax(gameState): if gameState.isWin() or gameState.isLose() or self.myDepth == self.depth: return self.evaluationFunction(gameState) numAgents = gameState.getNumAgents() for i in range(0, numAgents, 1): legalMoves = gameState.getLegalActions(i) successors = [gameState.generateSuccessor(j, legalMoves[j]) for j, move in enumerate(legalMoves)] for successor in successors: if i == 0: return maxValue(successor, i) else: return minValue(successor, i) def minValue(gameState, agentIndex): minUtility = float('inf') legalMoves = gameState.getLegalActions(agentIndex) succesors = [gameState.generateSuccessor(i, legalMoves[i]) for i, move in enumerate(legalMoves)] for successor in successors: minUtility = min(minUtility, miniMax(successor)) return minUtility def maxValue(gameState, agentIndex) self.myDepth += 1 maxUtility = float('-inf') legalMoves = gameState.getLegalActions(agentIndex) successors = [gameState.generateSuccessor(i, legalMoves[i]) for i, move in enumerate(legalMoves)] for successor in successors: maxUtility = max(maxUtility, miniMax(successor)) return maxUtility return miniMax(gameState)
有人知道为什么我的代码会这样做吗？我希望有一些Minimax/人工智能专家能够识别我的问题。提前谢谢

更新：通过将我的
self.myDepth
值实例化为
0
而不是
1
，我已指示异常引发问题。然而，我的实现总体上仍然不正确。
我终于找到了解决问题的方法。主要问题是，我没有正确地引用
深度
，以跟踪层。与其在
maxValue
方法中增加深度，不如将其作为参数传递给每个函数，并且仅在传递到
maxValue
时增加深度。还有其他几个逻辑错误，例如没有正确引用
numAgents
，以及我的
miniMax
方法没有返回操作这一事实。以下是我的解决方案，结果证明是可行的：

def getAction(self, gameState): self.numAgents = gameState.getNumAgents() self.myDepth = 0 self.action = Direction.STOP # Imported from a class that defines 5 directions def miniMax(gameState, index, depth, action): maxU = float('-inf') legalMoves = gameState.getLegalActions(index) for move in legalMoves: tempU = maxU successor = gameState.generateSuccessor(index, move) maxU = minValue(successor, index + 1, depth) if maxU > tempU: action = move return action def maxValue(gameState, index, depth): if gameState.isWin() or gameState.isLose() or depth == self.depth: return self.evaluationFunction(gameState) index %= (self.numAgents - 1) maxU = float('-inf') legalMoves = gameState.getLegalActions(index) for move in legalMoves: successor = gameState.generateSuccessor(index, move) maxU = max(maxU, minValue(successor, index + 1, depth) return maxU def minValue(gameState, index, depth): if gameState.isWin() or gameState.isLose() or depth == self.depth: return self.evaluationFunction(gameState) minU = float('inf') legalMoves = gameState.getLegalActions(index) if index + 1 == self.numAgents: for move in legalMoves: successor = gameState.generateSuccessor(index, move) # Where depth is increased minU = min(minU, maxValue(successor, index, depth + 1) else: for move in legalMoves: successor = gameState.generateSuccessor(index, move) minU = min(minU, minValue(successor, index + 1, depth) return minU return miniMax(gameState, self.index, self.myDepth, self.action)

还有普雷斯托！我们最后一个工作的多代理minimax实现。
我没有看到在代码中定义的函数
getScore（）
。它应该在哪里？它在一个完全独立的testCases类中。我会把它加进去，但断章取义，它可能看起来有点奇怪。更重要的事实是，发生这种情况是因为我的
self.evaluationFunction（gameState）
在达到终端状态或最大深度之前被调用。在
miniMax
中的
if i==0
行对我来说似乎非常粗略。在外循环的第一个过程中，该条件总是
True
，因此下一行将
返回
，而不会考虑其他代理。代码中的缩进是否正确？我确保我的代码与我的代码完全匹配，但我明白你的意思。我正在为
def miniMax（gameState）
使用的psuedo代码说：“1）如果状态是赢/输/终端，则返回实用程序。2）如果下一个代理是pacman，则返回maxValue（gameState）。3）如果下一个代理是ghost，则返回minValue（gameState）”。除了对范围（0，numAgents，1）中的I使用
之外，我没有看到其他跟踪代理的方法：
好吧，每次运行函数时只能
返回一次，因此在循环中使用无条件返回，使得以后的迭代永远不会发生。也许下一个代理应该是游戏状态的一部分？