我一直在做一个深入的Q学习蛇游戏,在我的空闲时间,计划添加遗传算法组件。为了达到这个目的,我建立了一个循环,这样我就能创造出一个给定的蛇群,每一种蛇都会跑上几个阶段,总共有几代人。
应该很简单。只是一些嵌套的循环。只是,我从我的for循环中得到了一些非常疯狂的结果。
以下是有关的守则:
def run(population_size=1, max_episodes=10, max_generations=50):
total_score = 0
agents = [Agent() for i in range(population_size)]
game = SnakeGameAI()
for cur_gen in range(max_generations):
game.generation = cur_gen
for agent_num, agent in enumerate(agents):
# Set colors
game.color1 = agent.color1
game.color2 = agent.color2
# Set agent number
game.agent_num = agent_num
for cur_episode in range(1, max_episodes+1):
# Get old state
state_old = agent.get_state(game)
# Get move
final_move = agent.get_action(state_old)
# Perform move and get new state
reward, done, score = game.play_step(final_move)
state_new = agent.get_state(game)
# Train short memory
agent.train_short_memory(state_old, final_move, reward, state_new, done)
# Remember
agent.remember(state_old, final_move, reward, state_new, done)
# Snake died
if done:
# Train long memory, plot result
game.reset()
agent.episode = cur_episode
game.agent_episode = cur_episode
agent.train_long_memory()
if score > game.top_score:
game.top_score = score
agent.model.save()
total_score += score
game.mean_score = np.round((total_score / cur_episode), 3)
print(f"Agent{game.agent_num}")
print(f"Episode: {cur_episode}")
print(f"Generation: {cur_gen}")
print(f"Score: {score}")
print(f"Top Score: {game.top_score}")
print(f"Mean: {game.mean_score}\n")这就是它的输出:
Agent0
Episode: 3
Generation: 7
Score: 0
Top Score: 0
Mean: 0.0
Agent0
Episode: 3
Generation: 14
Score: 0
Top Score: 0
Mean: 0.0
Agent0
Episode: 7
Generation: 20
Score: 1
Top Score: 1
Mean: 0.143
Agent0
Episode: 10
Generation: 26
Score: 0
Top Score: 1
Mean: 0.1
Agent0
Episode: 6
Generation: 28
Score: 1
Top Score: 1
Mean: 0.333
Agent0
Episode: 5
Generation: 37
Score: 0
Top Score: 1
Mean: 0.4
Agent0
Episode: 3
Generation: 43
Score: 0
Top Score: 1
Mean: 0.667
Agent0
Episode: 1
Generation: 45
Score: 1
Top Score: 1
Mean: 3.0
Agent0
Episode: 2
Generation: 49
Score: 0
Top Score: 1
Mean: 1.5世代数每秒钟稳步上升,直到到达49并结束循环,而每一次蛇死后,插播数都会随机变化。太离奇了。我从未见过这样的情况,也不知道我的代码中可能会有什么原因。
发布于 2022-02-09 20:31:18
答案:
对于每个不想看伊莱·哈罗德帮助我解决这个问题的评论的人来说,问题是我的代码把每一集都当成了游戏的框架。所以,不是一集是一条蛇的全部寿命(整个游戏),每次蛇采取行动都是一集。
下面是我的代码现在的样子。我添加了一个run循环,解决了这个问题。
def run(population_size=1, max_episodes=10, max_generations=50):
total_score = 0
agents = [Agent() for i in range(population_size)]
game = SnakeGameAI()
for cur_gen in range(max_generations):
game.generation = cur_gen
for agent_num, agent in enumerate(agents):
# Set colors
game.color1 = agent.color1
game.color2 = agent.color2
# Set agent number
game.agent_num = agent_num
for cur_episode in range(1, max_episodes+1):
run = True
while run:
# Get old state
state_old = agent.get_state(game)
# Get move
final_move = agent.get_action(state_old)
# Perform move and get new state
reward, done, score = game.play_step(final_move)
state_new = agent.get_state(game)
# Train short memory
agent.train_short_memory(state_old, final_move, reward, state_new, done)
# Remember
agent.remember(state_old, final_move, reward, state_new, done)
# Snake died
if done:
run = False
# Train long memory, plot result
game.reset()
agent.episode = cur_episode
game.agent_episode = cur_episode
agent.train_long_memory()
if score > game.top_score:
game.top_score = score
agent.model.save()
total_score += score
game.mean_score = np.round((total_score / cur_episode), 3)
print(f"Agent{game.agent_num}")
print(f"Episode: {cur_episode}")
print(f"Generation: {cur_gen}")
print(f"Score: {score}")
print(f"Top Score: {game.top_score}")
print(f"Mean: {game.mean_score}\n")https://stackoverflow.com/questions/71023519
复制相似问题