Gymnasium 0.28.1 FrozenLake 环境 Q-Learning 代码示例
好的,根据您的要求,我将为您提供一个详细的代码示例来回答您的问题。下面是一个使用gym库的FrozenLake环境的示例代码:
import gym
import random
import numpy as np
def train():
# 创建FrozenLake环境
env = gym.make("FrozenLake-v0")
# 设置随机种子
seed = 42
env.seed(seed)
# 初始化Q表
num_states = env.observation_space.n
num_actions = env.action_space.n
Q = [[0] * num_actions for _ in range(num_states)]
# 设置训练参数
num_episodes = 10000
max_steps_per_episode = 100
learning_rate = 0.1
discount_factor = 0.99
epsilon = 1.0
min_epsilon = 0.01
epsilon_decay = 0.001
# 训练Agent
for episode in range(num_episodes):
state = env.reset()
total_reward = 0
for step in range(max_steps_per_episode):
# 选择动作
if random.uniform(0, 1) < epsilon:
action = env.action_space.sample() # 随机动作
else:
action = np.argmax(Q[state]) # 根据Q值选择动作
# 执行动作
next_state, reward, done, _ = env.step(action)
# 更新Q值
Q[state][action] = (1 - learning_rate) * Q[state][action] + \
learning_rate * (reward + discount_factor * max(Q[next_state]))
total_reward += reward
state = next_state
if done:
break
# 衰减epsilon值
epsilon = max(min_epsilon, epsilon - epsilon_decay)
# 打印训练进度
if (episode + 1) % 1000 == 0:
print(f"Episode {episode+1}/{num_episodes} - Reward: {total_reward}")
# 显示训练后的Q值
print("Q-table:")
for i, q_values in enumerate(Q):
print(f"State {i}: {q_values}")
if __name__ == "__main__":
train()
这个代码示例使用gym库的FrozenLake-v0环境来训练一个基于Q-learning的Agent。Agent通过与环境交互来学习并更新Q表中的值。训练过程中,Agent会选择动作并执行,然后更新Q值。最后,训练结束后会显示训练后的Q值。
请确保您已经正确安装了gymnasium库,并将其版本升级到0.28.1。如果遇到任何问题,请随时向我提问。
原文地址: http://www.cveoy.top/t/topic/b7Xa 著作权归作者所有。请勿转载和采集!