Home Artificial Intelligence Best and No.1 Introduction to Reinforcement Learning!

Best and No.1 Introduction to Reinforcement Learning!

Basic Terminology regarding Reinforcement Learning

  • Agent: It is an assumed entity which performs actions in an environment to gain some reward. 
  • Environment (e): A scenario that an agent has to face. 
  • Reward (R): An immediate return given to an agent when he or she performs specific action or task. 
  • State (s): State refers to the current situation returned by the environment. 
  • Policy (π): It is a strategy which applies by the agent to decide the next action based on the current state.
  • Value (V): It is expected long-term return with discount, as compared to the short-term reward. 
  • Value Function: It specifies the value of a state that is the total amount of reward. It is an agent which should be expected beginning from that state. 
  • Model of the environment: This mimics the behavior of the environment. It helps you to make inferences to be made and also determine how the environment will behave. 
  • Model based methods: It is a method for solving reinforcement learning problems which use model-based methods. 
  • Q value or action value (Q): Q value is quite similar to value. The only difference between the two is that it takes an additional parameter as a current action

How Reinforcement Learning works?

Let’s see some simple example which helps you to illustrate the reinforcement learning mechanism. Consider the scenario of teaching new tricks to your cat.

  • As cat doesn’t understand English or any other human language, we can’t tell her directly what to do. Instead, we follow a different strategy. 
  • We emulate a situation, and the cat tries to respond in many different ways. If the cat’s response is the desired way, we will give her fish. 
  • Now whenever the cat is exposed to the same situation, the cat executes a similar action with even more enthusiastically in expectation of getting more reward(food). 
  • That’s like learning that cat gets from “what to do” from positive experiences. 
  • At the same time, the cat also learns what not do when faced with negative experiences.

Explanation about the example:

How Reinforcement Learning works

In this case, 

  • Your cat is an agent that is exposed to the environment. In this case, it is your house. An example of a state could be your cat sitting, and you use a specific word in for cat to walk. 
  • Our agent reacts by performing an action transition from one “state” to another “state.” 
  • For example, your cat goes from sitting to walking. 
  • The reaction of an agent is an action, and the policy is a method of selecting an action given a state in expectation of better outcomes. 
  • After the transition, they may get a reward or penalty in return. 

Reinforcement Learning Algorithms

There are three approaches to implement a Reinforcement Learning algorithm. 


In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the agent is expecting a long-term return of the current states under policy π


In a policy-based RL method, you try to come up with such a policy that the action performed in every state helps you to gain maximum reward in the future. 

Two types of policy-based methods are: 

  • Deterministic: For any state, the same action is produced by the policy π. 
  • Stochastic: Every action has a certain probability, which is determined by the following equation.Stochastic Policy : n{a\s) = P\A, = a\S, =S]


In this Reinforcement Learning method, you need to create a virtual model for each environment. The agent learns to perform in that specific environment.

Characteristics of Reinforcement Learning

Here are important characteristics of reinforcement learning 

  • There is no supervisor, only a real number or reward signal 
  • Sequential decision making 
  • Time plays a crucial role in Reinforcement problems 
  • Feedback is always delayed, not instantaneous 
  • Agent’s actions determine the subsequent data it receives

Types of Reinforcement Learning

Two kinds of reinforcement learning methods are: 


It is defined as an event, that occurs because of specific behavior. It increases the strength and the frequency of the behavior and impacts positively on the action taken by the agent. 

This type of Reinforcement helps you to maximize performance and sustain change for a more extended period. However, too much Reinforcement may lead to over-optimization of state, which can affect the results. 


Negative Reinforcement is defined as strengthening of behavior that occurs because of a negative condition which should have stopped or avoided. It helps you to define the minimum stand of performance. However, the drawback of this method is that it provides enough to meet up the minimum behavior. 

Learning Models of Reinforcement 

There are two important learning models in reinforcement learning: 

  • Markov Decision Process
  • Q learning 

Markov Decision Process

The following parameters are used to get a solution: 

  • Set of actions- A 
  • Set of states -S
  • Reward- R 
  • Policy- n 
  • Value- V

The mathematical approach for mapping a solution in reinforcement Learning is recon as a Markov Decision Process or (MDP). 


Q learning is a value-based method of supplying information to inform which action an agent should take. 

Let’s understand this method by the following example: 

  • There are five rooms in a building which are connected by doors.
  • Each room is numbered 0 to 4
  • The outside of the building can be one big outside area (5) 
  • Doors number 1 and 4 lead into the building from room 5 

Next, you need to associate a reward value to each door: 

  • Doors which lead directly to the goal have a reward of 100
  • Doors which is not directly connected to the target room gives zero reward
  • As doors are two-way, and two arrows are assigned for each room
  • Every arrow in the above image contains an instant reward value


In this image, you can view that room represents a state

Agent’s movement from one room to another represents an action 

In the below-given image, a state is described as a node, while the arrows show the action. 

For example, an agent traverse from room number 2 to 5 

  • Initial state = state 2
  • State 2-> state 3
  • State 3 -> state (2,1,4)
  • State 4-> state (0,5,3)
  • State 1-> state (5,3)
  • State 0-> state 4

Reinforcement Learning vs. Supervised Learning 

ParametersReinforcement LearningSupervised Learning
Decision style reinforcement learning helps you to take your decisions sequentially. In this method, a decision is made on the input given at the beginning. 
Works on Works on interacting with the environment. Works on examples or given sample data. 
Dependency on decision In RL method learning decision is dependent. Therefore, you should give labels to all the dependent decisions. Supervised learning the decisions which are independent of each other, so labels are given for every decision. 
Best suited Supports and work better in AI, where human interaction is prevalent. It is mostly operated with an interactive software system or applications. 
Example Chess game Object recognition 

Applications of Reinforcement Learning

Here are applications of Reinforcement Learning: 

  • Robotics for industrial automation.
  • Business strategy planning
  • Machine learning and data processing
  • It helps you to create training systems that provide custom instruction and materials according to the requirement of students.
  • Aircraft control and robot motion control

Why use Reinforcement Learning?

Here are prime reasons for using Reinforcement Learning: 

  • It helps you to find which situation needs an action
  • Helps you to discover which action yields the highest reward over the longer period.
  • Reinforcement Learning also provides the learning agent with a reward function. 
  • It also allows it to figure out the best method for obtaining large rewards.

When Not to Use Reinforcement Learning?

You can’t apply reinforcement learning model is all the situation. Here are some conditions when you should not use reinforcement learning model. 

  • When you have enough data to solve the problem with a supervised learning method 
  • You need to remember that Reinforcement Learning is computing-heavy and time-consuming. in particular when the action space is large. 

Challenges of Reinforcement Learning

Here are the major challenges you will face while doing Reinforcement earning: 

  • Feature/reward design which should be very involved 
  • Parameters may affect the speed of learning. 
  • Realistic environments can have partial observability. 
  • Too much Reinforcement may lead to an overload of states which can diminish the results. 
  • Realistic environments can be non-stationary. 


  1. Reinforcement Learning is a Machine Learning method
  2. Helps you to discover which action yields the highest reward over the longer period.
  3. Three methods for reinforcement learning are 1) Value-based 2) Policy-based and Model based learning. 
  4. Agent, State, Reward, Environment, Value function Model of the environment, Model based methods, are some important terms using in RL learning method
  5. The example of reinforcement learning is your cat is an agent that is exposed to the environment.
  6. The biggest characteristic of this method is that there is no supervisor, only a real number or reward signal
  7. Two types of reinforcement learning are 1) Positive 2) Negative
  8. Two widely used learning model are 1) Markov Decision Process 2) Q learning 
  9. Reinforcement Learning method works on interacting with the environment, whereas the supervised learning method works on given sample data or example.
  10. Application or reinforcement learning methods are: Robotics for industrial automation and business strategy planning
  11. You should not use this method when you have enough data to solve the problem 
  12. The biggest challenge of this method is that parameters may affect the speed of learning


Comments are closed.

Exit mobile version