算法学习：Pac-Man的简单对抗

Pacman项目是加州大学伯克利分校提供的一个可视化的AI学习平台。其主体利用python完成。该项目提供了丰富的说明文档，以及预先实现了一些简单的算法供参考各接口的使用。

http://ai.berkeley.edu/project_overview.html

本文利用Pac-Man平台实现简单的对抗搜索。

Part1 : Reflex Agent

提供的ReflexAgent有以下接口：

class ReflexAgent(Agent):

    def getAction(self,gameState)

    def evaluationFunction(self, currentGameState, action)

根据对getAction的分析：

def getAction(self, gameState) #根据当前的评估函数确定下一步的动作

        #获取当前可能的下一步的方向 有stop east ...

        legalMoves = gameState.getLegalActions()

        #根据评估函数获取所有的下一步的权值

        scores = [self.evaluationFunction(gameState, action) for action in legalMoves]

        #获取最好的分数

        bestScore = max(scores)

        #根据最好的分数获取最好的行动选择

        bestIndices = [index for index in range(len(scores)) if scores[index] == bestScore]

        #从最好的行动集合中随机选择一个最为下一步的动作

        chosenIndex = random.choice(bestIndices)

        return legalMoves[chosenIndex]

可知evaluationFunction是这个Agent的灵魂，评估每一个输入的分数。根据提供的下面函数的接口，设计了一个简单的Pacman的AI。

初始版本的AI如该类所述，只提供了基本的反射。该agent所处环境包括以下内容

食物到Agent的距离
怪兽到Agent的距离
超级豆子到Agent的距离
下一个状态能否吃到豆子或者被Ghost吃掉

计算出这些参数，给这些参数以固定的权值，就写出了最基础的AI

    def evaluationFunction(self, currentGameState, action):

        # 获取当前游戏状态 其中G表示为Ghost %表示为墙 角标表示pacman 角标方向代表上一次选择的方向

        successorGameState = currentGameState.generatePacmanSuccessor(action)

        # print 'successorGameState\n',successorGameState

        # 获取这样移动后新的位置

        newPos = successorGameState.getPacmanPosition()

        # print 'newPos',newPos

        # 获取食物在图中的分布（二维数组，有失误为T没食物为F)

        newFood = successorGameState.getFood()

        curFood = currentGameState.getFood()

        # print 'newFood',newFood

        # 获取Ghost的位置

        newGhostStates = successorGameState.getGhostStates()

        # print 'ghostState',newGhostStates[0].getPosition()

        # 获取吃超级豆子之后 Ghost害怕还剩余的时间

        newScaredTimes = [ghostState.scaredTimer for ghostState in newGhostStates]

        # 对这个选择评估的分数

        currscore = 0

        if action == "Stop":

            return -100

        # 如果当前状态能够使ghost害怕，将所有的时间加入进来

        for st in newScaredTimes:

            currscore += st

        # 根据Ghost所在的位置，获取与当前位置的距离

        ghost_distances = []

        for gs in newGhostStates:

            ghost_distances += [manhattanDistance(gs.getPosition(),newPos)]

        # 获取food所在的所有pos

        foodList = newFood.asList()

        curfoodList = curFood.asList()

        # 获取food所在的所有wall

        wallList = currentGameState.getWalls().asList()

        # 保存food的距离

        food_distences = []

        # 获取所有食物到达当前位置的距离

        for foodpos in foodList:

            food_distences += [manhattanDistance(newPos,foodpos)]

        # 对食物的距离取反

        inverse_food_distences=0;

        if len(food_distences)>0 and min(food_distences) > 0:

            inverse_food_distences = 1.0 / min(food_distences)

        # 考虑了ghost与当前的距离，其权值更大

        currscore += min(ghost_distances)*(inverse_food_distences**4)

        # 获取当前系统判定的分数 又可能当前吃到了豆子 分数更高些

        currscore+=successorGameState.getScore()

        if newPos in curfoodList:

            currscore = currscore * 1.1

        return currscore

运行测试

python autograder.py -q q1

10次测试如下

Pacman emerges victorious! Score: 1228

Pacman emerges victorious! Score: 1253

Pacman emerges victorious! Score: 1246

Pacman emerges victorious! Score: 1255

Pacman emerges victorious! Score: 1247

Pacman emerges victorious! Score: 1257

Pacman emerges victorious! Score: 1244

Pacman emerges victorious! Score: 1260

Pacman emerges victorious! Score: 1261

Pacman emerges victorious! Score: 1258

Average Score: 1250.9

Scores:        1228.0, 1253.0, 1246.0, 1255.0, 1247.0, 1257.0, 1244.0, 1260.0, 1261.0, 1258.0

Win Rate:      10/10 (1.00)

Record:        Win, Win, Win, Win, Win, Win, Win, Win, Win, Win

测试GUI

直接运行游戏GUI

python pacman.py -p ReflexAgent -k 2

Part 2 : MinMax

利用MinMax博弈树，这里模拟的Ghost可能不止一个，在计算Min节点的时候增加了对多Ghost的支持。

实际运行游戏时候，可以用-k参数(<3)来选择Ghost个数。

与gameState.getNumAgents()对接，可以模拟多个Ghost，从而选择威胁最大的那个作为最终的min节点。

class MinimaxAgent(MultiAgentSearchAgent):

    def getAction(self, gameState):

        def max_value(state, currentDepth):

            # 当前深度加一

            currentDepth=currentDepth+1

            # 若当前状态已经赢了或输了 或者 已经到达了规定的深度

            if state.isWin() or state.isLose() or currentDepth == self.depth:

                return self.evaluationFunction(state)

            # 初始化v

            v= float('-Inf')

            # 对每个min分支求max

            for pAction in state.getLegalActions(0):

                v=max(v, min_value(state.generateSuccessor(0, pAction), currentDepth, 1))

            return v

        def min_value(state, currentDepth, ghostNum):

            # 若当前状态已经赢了或输了

            if state.isWin() or state.isLose():

                return self.evaluationFunction(state)

            # 初始化v

            v=float('Inf')

            # 对每个max分支求min 其中有多个Ghost 所有多个Ghost分支

            for pAction in state.getLegalActions(ghostNum):

                if ghostNum == gameState.getNumAgents()-1:

                    #所有Ghost的min找完了 开始找下一个max

                    v=min(v, max_value(state.generateSuccessor(ghostNum, pAction), currentDepth))

                else:

                    #继续下一个Ghost

                    v=min(v, min_value(state.generateSuccessor(ghostNum, pAction), currentDepth, ghostNum+1))

            return v

        # pacman下一个状态可能的行动

        Pacman_Actions = gameState.getLegalActions(0)

        maximum = float('-Inf')

        result = ''

        # 针对下一个状态 寻找获胜概率最高的move

        for action in Pacman_Actions:

            if(action != "Stop"):

                currentDepth = 0

                # 而所有的Ghost希望胜利概率最低的选择

                currentMax = min_value(gameState.generateSuccessor(0, action), currentDepth , 1)

                if currentMax > maximum:

                    maximum=currentMax

                    result =action

        return result

当开始跑测试的时候，就发现在下面参数的情况下，走一步已经非常吃力了，其时间复杂度过高。

运行命令python pacman.py -p MinimaxAgent -k 2 -a depth=4

输出每步决策所消耗的时间，每步大致花费时间如下，Time的单位为ms：

Go  East Value is  43.0  Time:  1994.49194336

Go  East Value is  50.0  Time:  6805.0690918

Go  East Value is  58.0  Time:  4821.87817383

Go  East Value is  67.0  Time:  3456.60791016

Go  North Value is  76.0  Time:  1419.2019043

Go  North Value is  63.0  Time:  3504.87597656

Go  East Value is  72.0  Time:  2485.83081055

Go  East Value is  103.0  Time:  1185.18701172

每步所花费时间大概在3s左右，当三个Agent的选择均非常多的情况下，每步所消耗的时间有可能会到达6s

急需AlphaBate剪枝，下面加入AlphaBate剪枝。

Part 3 : Alpha-Bate 剪枝

class AlphaBetaAgent(MultiAgentSearchAgent):

    def getAction(self, gameState):

        def max_value(state, alpha, beta, currentDepth):

            # 当前深度加一

            currentDepth=currentDepth+1

            # 若当前状态已经赢了或输了 或者 已经到达了规定的深度

            if state.isWin() or state.isLose() or currentDepth == self.depth:

                return self.evaluationFunction(state)

            v=float('-Inf')

            # 对每个min分支求max

            for pAction in state.getLegalActions(0):

                if pAction!="Stop":

                    v=max(v, min_value(state.generateSuccessor(0, pAction), alpha, beta, currentDepth, 1))

                    # 若已经比beta要大了 就没有搜索下去的必要了

                    if v >= beta:

                        return v

                    # 更新alpha的值

                    alpha=max(alpha, v)

            return v

        def min_value(state, alpha, beta, currentDepth, ghostNum):

            # 若当前状态已经赢了或输了

            if state.isWin() or state.isLose():

                return self.evaluationFunction(state)

            # 初始化v

            v=float('Inf')

            # 对每个max分支求min 其中有多个Ghost 所有多个Ghost分支

            for pAction in state.getLegalActions(ghostNum):

                if ghostNum == gameState.getNumAgents()-1:

                    # 所有Ghost的min找完了 开始找下一个max

                    v=min(v, max_value(state.generateSuccessor(ghostNum, pAction), alpha, beta, currentDepth))

                else:

                    # 继续下一个Ghost

                    v=min(v,

                          min_value(state.generateSuccessor(ghostNum, pAction), alpha, beta, currentDepth, ghostNum+1))

                # 若比alpha还要小了 就没搜索的必要了

                if v <= alpha:

                    return v

                # 更新beta的值

                beta=min(beta, v)

            return v

        # pacman下一个状态可能的行动

        pacmanActions=gameState.getLegalActions(0)

        maximum=float('-Inf')

        # 初始化alpha bate

        alpha=float('-Inf')

        beta=float('Inf')

        maxAction=''

        # 针对下一个状态 寻找获胜概率最高的move

        for action in pacmanActions:

            if action!="Stop":

                currentDepth=0

                # 而所有的Ghost希望胜利概率最低的选择

                currentMax=min_value(gameState.generateSuccessor(0, action), alpha, beta, currentDepth, 1)

                if currentMax > maximum:

                    maximum=currentMax

                    maxAction=action

        print maximum

        return maxAction

利用了AlphaBate之后在同样的参数下运行速度明显增加。

但是由于效果还是不好，感觉是系统提供的启发函数不太完美。系统提供的接口如下：

class MultiAgentSearchAgent(Agent):

    def __init__(self, evalFn = 'scoreEvaluationFunction', depth = '2'):

        self.index = 0 # Pacman is always agent index 0

        self.evaluationFunction = util.lookup(evalFn, globals())

        self.depth = int(depth)

在第一部分的基础反射启发函数基础上，修改默认的启发函数如下：

def scoreEvaluationFunction(currentGameState,cur_score):

    # 获取food所在的所有wall

    wallList=currentGameState.getWalls().asList()

    Pos = currentGameState.getPacmanPosition()

    # 获取食物在图中的分布（二维数组，有失误为T没食物为F)

    curFood=currentGameState.getFood()

    # 获取Ghost的位置

    GhostStates=currentGameState.getGhostStates()

    # 获取吃超级豆子之后 Ghost害怕还剩余的时间

    scaredTimes=[ghostState.scaredTimer for ghostState in GhostStates]

    # 对这个选择评估的分数

    currscore=0

    # 根据Ghost所在的位置，获取与当前位置的距离

    ghost_distances=[]

    for gs in GhostStates:

        ghost_distances+=[manhattanDistance(gs.getPosition(), Pos)]

    ghost_index = 0;

    min_ghost_distances =  min(ghost_distances);

    is_scared = False

    for time in scaredTimes:

        if time != 0:

            is_scared = True

        else:

            is_scared = False

            break

    # 获取food所在的所有pos

    curfoodList=curFood.asList()

    # 保存food的距离

    food_distences=[]

    # 获取所有食物到达当前位置的距离

    for foodpos in curfoodList:

        food_distences+=[manhattanDistance(Pos, foodpos)]

    # 对食物的距离取反

    inverse_food_distences=0;

    if len(food_distences) > 0 and min(food_distences) > 0:

        inverse_food_distences=1.0 / min(food_distences)

    if is_scared and min_ghost_distances!=0:

        # if min_ghost_distances < 10:

            # min_ghost_distances = 800 min_ghost_distances

        # else:

            # min_ghost_distances = 600 min_ghost_distances

        print "Ghost Scared!"

        min_ghost_distances = min_ghost_distances * 0.8

    # 考虑了ghost与当前的距离，其权值更大

    if min(ghost_distances) == 0:

        currscore+=inverse_food_distences

    else:

        currscore+= min_ghost_distances * (float(inverse_food_distences))

    # 获取当前系统判定的分数 又可能当前吃到了豆子 分数更高些

    currscore+=currentGameState.getScore()

    return currscore

测试命令 python pacman.py -p AlphaBetaAgent -k 2 -a depth=4

输出每步决策所消耗的时间，每步大致花费时间如下，Time的单位为ms：

Go  East Value is  43.0  Time:  215.517822266

Go  East Value is  50.0  Time:  820.08203125

Go  East Value is  58.0  Time:  712.662841797

Go  East Value is  69.0  Time:  320.084960938

Go  North Value is  78.0  Time:  1038.60400391

Go  North Value is  66.0  Time:  862.83203125

Go  East Value is  76.0  Time:  774.194091797

Go  East Value is  108.0  Time:  357.637207031

Go  East Value is  113.5  Time:  812.993164062

Go  North Value is  126.0  Time:  402.672851562

每步用时比没有剪枝的情况减少了一半以上

正常测试平均在1500 point左右，但仍然存在以下问题

当两个Agent分别在Pacman的两边或两边存在同样数量的豆子时，左右徘徊
因为使用MinMax博弈树而Ghost在离Pacman足够远时威胁很小
导致Agent Pacman在豆子吃了大部分的时候非常谨慎
参数调配问题

由于这些问题，导致了Pacman在某些情况下依旧显得不够智能。在该问题的实际情况中：

Ghost其实并没有博弈的概念，所以大部分猜想是浪费时间的
Pacman游戏其实只需要局部考虑，无需过多全局考虑，也就是说：

当Ghost离Agent足够远的时候，其实Ghost的行动对于Pacman影响不大，没必要过多考虑。

但是比如围棋博弈，这种全局观念就很重要了。
启发函数还是人为为Ghost制定反射