ECE 277, WINTER 2024
GPU Programming
LAB 3: Reinforcement learning: Q-learning (Multi-Agent, CUDA multithreads)
This lab requires to design multiple agents to interact with the environment using the rein- forcement learning algorithm in Figure 1. Specifically, you need to design multiple agents to maximize rewards from the mine game environment using Q-learning. The agents should interact with the given mine game environment in Figure 2.
Figure 1: Parallel reinforcement learning.
Figure 2: 32x32 mine game environment.
• The number of agents: 128
• Action: right:0, down:1, left:2, up:3
• Reward: flag: +1, mine: -1, otherwise: 0
• State: (x, y) current position of an agent in the coordinator of (0,0) at the top-left corner
• Every episode restarts after the number of active agents reaches less than 20%.
• You need to maintain active agents in each episode. You should prevent inactive agents from taking an action and updating the Q-table since environment returns wrong rewards to inactive agents.
• Initial states of environment are randomized every episode.
• Environment elements such as mine distributions and a flag position are randomized every game.
• Agenti should return action[Agenti] to the corresponding current state,cstate[Agenti].
• Agenti should update a centralized Q table along with a current state, cstate[Agenti], a next state, nstate[Agenti], and a reward, rewards[Agenti].
• Share a single centralized Q table for all agents (a centralized learning and decentralized execution approach).
• You should initialize the Q table using a multithreads kernel instead of using CPU functions such as“cudaMemSet”.
• In the learning environment, TA means the total number of agents, FA indicates a percentile of agents catching flag, and AA shows a percentile of active agents in a current episode.
You should not modify any given codes except CMakeLists to add your codes. You only need to add your agent code to the lab project.
You have to use CUDA to program a multi-agent reinforcement learning algorithm.
Interface pointers of all the extern functions are allocated to the Device (GPU) memory (not CPU memory) The below function is an informative RL environment routine to show
when and how agent functions are called extern void agent in it ( ) ;
extern void agent in it episode ( ) ;
extern float agent adjust epsilon ( ) ;
extern short * agent action ( in t 2 * c state ) ;
extern void agent update ( in t 2 * c state , in t 2 * n state , float * rewards ) ;
int q learning Cls : : learning ( int * board , unsigned int &episode , unsigned int &steps )
if ( m episode == 0 && m steps==0) { // only for first episode
env . reset ( m sid ) ;
agent in it ( ) ; // clear action + in it Q table + self initialization
} else {
active agent = check status ( board , env . m state , flag agent ) ;
if ( m newepisode ) {
env . reset ( m sid ) ;
agent in it episode ( ) ; // set all agents in active status
float epsilon = agent adjust epsilon ( ) ; // adjust epsilon
m steps = 0 ;
print f ( "EP=%4d , eps =%4.3 f \n" , m episode , epsilon ) ; m episode++;
} else {
short * action = agent action ( env . d state [ m sid ] ) ;
env . step ( m sid , action ) ;
agent update ( env . d state [ m sid ] , env . d state [ m sid ^ 1 ] , env . d reward ) ;
m sid ^= 1 ;
episode = m episode ;
steps = m steps ;
}}
m steps++;
env . render ( board , m sid ) ;
return m newepisode ;
The provided parameters are just for reference.
√ = 0.9; Q = 0.1; 0.1 ≤ ∈ - δ∈ ≤ 1.0; δ∈ = 0.001
Submit only your agent files into the assignment.
Programming language: CUDA