文章/答案/技术大牛

发布

社区首页 >问答首页 >Python Negamax算法

问Python Negamax算法
EN

Stack Overflow用户

提问于 2017-07-03 01:52:50

回答 1查看 812关注 0票数 2

我有一个尽可能简单的负极大算法，用于评估Tic Tac Toe中的位置。游戏的状态以一个数组的形式存储在numpy中，X的棋子由1表示，O的棋子由4表示。

我刚刚测试了一下，发现：

a = np.zeros(9).reshape(3,3)
negaMax(a, 6, 1) # Returned zero as it should
negaMax(a, 7, 1) # Returns 100

这意味着我的算法认为它已经找到了一种方法，让X在Tic Tac Toe的游戏中七次获胜，这显然是不可能对抗正派游戏的。我不知道如何让它打印出它找到的最好的动作，所以我在调试时遇到了真正的问题。我做错了什么？

def winCheck(state):
        """Takes a position, and returns the outcome of that game"""
        # Sums which correspond to a line across a column
        winNums = list(state.sum(axis=0))
        # Sums which correspond to a line across a row
        winNums.extend(list(state.sum(axis=1)))
        # Sums which correspond to a line across the main diagonal
        winNums.append(state.trace())
        # Sums which correspond to a line across the off diagonal
        winNums.append(np.flipud(state).trace())

        if Square.m in winNums:
                return 'X'
        elif (Square.m**2 + Square.m) in winNums:
                return 'O'
        elif np.count_nonzero(state) == Square.m**2:
                return 'D'
        else:
                return None

def moveFind(state):
        """Takes a position as an nparray and determines the legal moves"""
        moveChoices = []

        # Iterate over state, to determine which squares are empty
        it = np.nditer(state, flags=['multi_index'])
        while not it.finished:
            if it[0] == 0:
                    moveChoices.append(it.multi_index)
            it.iternext()
        return moveChoices

def moveSim(state, move, player):
        """Create the state of the player having moved without interfering with the board"""
        simState = state.copy()
        if player == 1:
                simState[move] = 1
        else:
                simState[move] = gamecfg.n + 1
        return simState

def positionScore(state):
        """The game is either won or lost"""
        if winCheck(state) == 'X':
                return 100
        elif winCheck(state) == 'O':
                return -100
        else:
                return 0

def negaMax(state, depth, colour):
        """Recursively find the best move via a negamax search"""
        if depth == 0:
                return positionScore(state) * colour

        highScore = -100

        moveList = moveFind(state)
        for move in moveList:
                score = -negaMax(moveSim(state, move, colour), depth -1, colour * -1)
                highScore = max(score, highScore)

        return highScore

python

algorithm

numpy

negamax

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-07-03 04:24:12

您的代码不会考虑在生成一行3个符号时停止游戏。

这意味着它是在玩tic-tac-toe的一个变体，其中即使在O做了一行3之后，如果X做了一行3，他就赢了。

对于这个变体，程序已经正确地发现X有可能总是获胜！

(我在我制作的一个国际象棋程序中遇到了同样的情况，在这个程序中，计算机很乐意牺牲它的国王，如果它稍后到达将军的话……)

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44873642

复制

相似问题

问Python Negamax算法
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Negamax算法EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Negamax算法
EN