An ELI5 introduction to Deep reinforcement learning

Ever wondered how NPC characters in games work?

Every game these days has a mechanism to introduce NPC characters, villains, bosses, animal friends, etc into their games these days.

Like for example, if you have played Far cry 5, there is Boomer (which is the cutest).

But ever wondered what goes behind these NPC characters that make the game so much interesting?

That’s right! These are all AI systems trained beforehand.

Let’s look at one of the popular training mechanisms and a historical game event that took place.

Deep Reinforcement Learning

There is this idea of reinforcement learning in AI where in an AI agent is treated with rewards if it performs as expected and is penalised if it is not performing as expected. This is not new. It is a mechanism that prevailed from the 90s.

RL Agent

But what is new is deep learning integrated into reinforcement learning and transformed into deep reinforcement learning.

Deep learning is an idea in which a large number of input training data (like images, videos, text, etc) is fed into a system and generates a output by simply multiplying the inputs with a weight. The value of weight is trained and improved over time.

The system is called a deep learning model. There are several architectures and patterns for these models to generate different types of responses. One of the famous examples is the ChatGPT. It uses a model architecture called as GPT (Generative Pre trained Transformer) developed by OpenAI which itself was inspired by an interesting architecture called as Transformers released by Google.

Back to DRL, now combining deep learning and reinforcement learning is not difficult if you think about it. (Or at least from a theoretical standpoint)

A deep learning model can help the AI agent see the world (its environment) more precisely. Hear the environment more precisely. Understand the environment more precisely.

For example, a vision model can help identify an object present in front of the game AI (like a sword present in front of it). This vastly improves the ability of the game agent to improve its actions.

This same idea can not only be applied to game agent, even self driving cars like Tesla, Waymo, Zoox, etc. are all based on the this fundamental principles. Although these systems can be more sophisticated because of the complexity of the real world than a simulated game environment.

But with that gentle introduction, here are some ways in which we can train our own systems.

There is a ML agent training system within Unity called as ML agents (Build More Engaging Games with ML Agents | Unity)

Check this blog for more explanation on ML Agents: Training intelligent adversaries using self-play with ML-Agents | Unity Blog

OpenAI has something called as OpenAI gym. ( Back in 2022, they moved their reinforcement learning tools to a new non-profit organisation called Farama). Here is the link for that: Gymnasium Documentation (farama.org)

Gym is amazing because of the bootstrapped nature of their environments. All it takes is a 6 line code to simulate a 2d lunar lander.

import gymnasium as gym

env = gym.make("LunarLander-v2", render_mode="human") 
observation, info = env.reset() 

for _ in range(1000): 
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action) 
    if terminated or truncated: 
        observation, info = env.reset() env.close()

AlphaGo

Although there are several tools like the Gym and ML Agents as mentioned above, it is all relevant these days only because of a historical game event that took place back in 2016.

In march 2016, Lee Sedol, an 18 times world champion in the ancient Chinese game of Go would be challenged by Google’s DeepMind for a match. Little at that time did Sedol knew that that match would be written in history for it being the first one to host an AI agent beating a Go world champion.

Unlike Chess, Go was more difficult and had unimaginable complications and was never beaten by an AI before. (Chess was first beaten by AI by IBM DeepBlue against Garry Kasparov).

Although Sedol beat AlphaGo once, AlphaGo beat him 4 times and won the match in a clean 4-1 victory. Since then, Google’s DeepMind was the all the rage everywhere and it popularised Deep reinforcement learning to the masses.

Learn more about it here: The Challenge Match (deepmind.com)