首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >循环值的随机变化

循环值的随机变化
EN

Stack Overflow用户
提问于 2022-02-07 18:21:59
回答 1查看 60关注 0票数 0

我一直在做一个深入的Q学习蛇游戏,在我的空闲时间,计划添加遗传算法组件。为了达到这个目的,我建立了一个循环,这样我就能创造出一个给定的蛇群,每一种蛇都会跑上几个阶段,总共有几代人。

应该很简单。只是一些嵌套的循环。只是,我从我的for循环中得到了一些非常疯狂的结果。

以下是有关的守则:

代码语言:javascript
复制
def run(population_size=1, max_episodes=10, max_generations=50):
    total_score = 0

    agents = [Agent() for i in range(population_size)]
    game = SnakeGameAI()

    for cur_gen in range(max_generations):
        game.generation = cur_gen
        for agent_num, agent in enumerate(agents):
            # Set colors
            game.color1 = agent.color1
            game.color2 = agent.color2

            # Set agent number
            game.agent_num = agent_num

            for cur_episode in range(1, max_episodes+1):
                # Get old state
                state_old = agent.get_state(game)

                # Get move
                final_move = agent.get_action(state_old)

                # Perform move and get new state
                reward, done, score = game.play_step(final_move)
                state_new = agent.get_state(game)

                # Train short memory
                agent.train_short_memory(state_old, final_move, reward, state_new, done)

                # Remember
                agent.remember(state_old, final_move, reward, state_new, done)

                # Snake died
                if done:
                    # Train long memory, plot result
                    game.reset()
                    agent.episode = cur_episode
                    game.agent_episode = cur_episode
                    agent.train_long_memory()

                    if score > game.top_score:
                        game.top_score = score
                        agent.model.save()

                    total_score += score
                    game.mean_score = np.round((total_score / cur_episode), 3)
                    
                    print(f"Agent{game.agent_num}")
                    print(f"Episode: {cur_episode}")
                    print(f"Generation: {cur_gen}")
                    print(f"Score: {score}")
                    print(f"Top Score: {game.top_score}")
                    print(f"Mean: {game.mean_score}\n")

这就是它的输出:

代码语言:javascript
复制
Agent0
Episode: 3
Generation: 7
Score: 0
Top Score: 0
Mean: 0.0

Agent0
Episode: 3
Generation: 14
Score: 0
Top Score: 0
Mean: 0.0

Agent0
Episode: 7
Generation: 20
Score: 1
Top Score: 1
Mean: 0.143

Agent0
Episode: 10
Generation: 26
Score: 0
Top Score: 1
Mean: 0.1

Agent0
Episode: 6
Generation: 28
Score: 1
Top Score: 1
Mean: 0.333

Agent0
Episode: 5
Generation: 37
Score: 0
Top Score: 1
Mean: 0.4

Agent0
Episode: 3
Generation: 43
Score: 0
Top Score: 1
Mean: 0.667

Agent0
Episode: 1
Generation: 45
Score: 1
Top Score: 1
Mean: 3.0

Agent0
Episode: 2
Generation: 49
Score: 0
Top Score: 1
Mean: 1.5

世代数每秒钟稳步上升,直到到达49并结束循环,而每一次蛇死后,插播数都会随机变化。太离奇了。我从未见过这样的情况,也不知道我的代码中可能会有什么原因。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-09 20:31:18

答案:

对于每个不想看伊莱·哈罗德帮助我解决这个问题的评论的人来说,问题是我的代码把每一集都当成了游戏的框架。所以,不是一集是一条蛇的全部寿命(整个游戏),每次蛇采取行动都是一集。

下面是我的代码现在的样子。我添加了一个run循环,解决了这个问题。

代码语言:javascript
复制
def run(population_size=1, max_episodes=10, max_generations=50):
    total_score = 0

    agents = [Agent() for i in range(population_size)]
    game = SnakeGameAI()

    for cur_gen in range(max_generations):
        game.generation = cur_gen
        for agent_num, agent in enumerate(agents):
            # Set colors
            game.color1 = agent.color1
            game.color2 = agent.color2

            # Set agent number
            game.agent_num = agent_num

            for cur_episode in range(1, max_episodes+1):
                run = True
                while run:
                    # Get old state
                    state_old = agent.get_state(game)

                    # Get move
                    final_move = agent.get_action(state_old)

                    # Perform move and get new state
                    reward, done, score = game.play_step(final_move)
                    state_new = agent.get_state(game)

                    # Train short memory
                    agent.train_short_memory(state_old, final_move, reward, state_new, done)

                    # Remember
                    agent.remember(state_old, final_move, reward, state_new, done)

                    # Snake died
                    if done:
                        run = False
                        # Train long memory, plot result
                        game.reset()
                        agent.episode = cur_episode
                        game.agent_episode = cur_episode
                        agent.train_long_memory()

                        if score > game.top_score:
                            game.top_score = score
                            agent.model.save()

                        total_score += score
                        game.mean_score = np.round((total_score / cur_episode), 3)
                        
                        print(f"Agent{game.agent_num}")
                        print(f"Episode: {cur_episode}")
                        print(f"Generation: {cur_gen}")
                        print(f"Score: {score}")
                        print(f"Top Score: {game.top_score}")
                        print(f"Mean: {game.mean_score}\n")
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71023519

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档