There is a fair amount of excitement around deep learning, machine learning, and artificial intelligence (AI), especially when it comes to the real potential of these technologies when applied in our factories, warehouses, businesses, and homes. The rate of development of this technology is fast-paced, and understanding the terms and applications will help prepare you for the workplace of the future.
What is reinforcement learning? Reinforcement learning (RL) is a semi-supervised learning model that is used in machine learning (ML), where machines learn through experience, and gain skills without human intervention.1 However, where supervised learning incorporates the answer within the dataset, reinforcement learning is employed by machines and software to discover the best action to bring about the best reward within a certain scenario.2
In more technical terms, RL is a technique that allows an agent to interact with an environment by taking actions in order to maximise the total rewards.3 Consider this metaphor: A young child is handed the TV’s remote control at your house. The scenario can be broken down as follows:
- Environment. The house with the TV, remote control, and the child
- Agent. The child
- State. The current, unaffected state of the house or environment
- Action. The child starts to experiment with the remote
- Next state. The effect that the experimentations or actions have on the environment result in a new, or next state
- Negative reward. If the TV is non-responsive and silent, the child does less of the action. Negative reinforcement or reward results in an action or behaviour being avoided or stopped altogether, thereby strengthening more favourable behaviour4
- Updating policy. The child needs to rethink its actions (update its policy) in order to get a positive response
- Continued learning. The agent will repeat the process, with other actions, until finding an action or policy that leads to a favourable reward. Positive reinforcement is when the strength and frequency of an event is increased due to a particular behaviour or action5
- Maximum reward. This is the ultimate goal for reinforcement learning, such as the child finally getting the TV to work
RL is usually modelled as a Markov Decision Process (MDP)6
What is deep learning?
Deep learning (DL) belongs in the machine-learning family, where artificial neural networks – algorithms that work similarly to the human brain – learn from large data sets.7 At its core, AI enables machines to carry out tasks that would ordinarily need human intelligence. This includes machine learning, of which deep learning is a subset.
The ‘deep’ in DL refers to the multiple (deep) layers of neural networks needed to facilitate learning. The DL algorithm repeatedly performs a task, and tweaks it every time to improve the end result, thus eliminating the need for implicit programming.8
DL’s primary resource for learning is the vast amount of data that is generated every day – over 2.5 quintillion bytes of data and climbing – which gives it the information needed to solve nearly any problem that requires ‘thought’ to answer.9 Coupled with the improved computing power that is available today, DL allows machines to find solutions to problems, regardless of the state of the data being input – whether unstructured, inter-connected, or very diverse – it doesn’t matter; the more DL algorithms learn, the better they become at finding solutions.10
What is deep reinforcement learning?
Deep reinforcement learning (DRL) is the coming together of these two fields: reinforcement learning (RL) and deep learning (DL).11 This combination has dramatically broadened the range of complex decision-making tasks that were previously outside of the capability of machines.
Successful applications of deep reinforcement learning
DeepMind’s AlphaZero is a perfect example of deep reinforcement learning in action, where AlphaZero – a single system that essentially taught itself how to play, and master, chess from scratch – has been officially tested by chess masters, and repeatedly won.12
Traditional chess engines, such as Stockfish13 and IBM’s Deep Blue,14 base their game plan on thousands of rules and scenarios designed by skilled human players, in order to pre-empt every possible scenario. However, AlphaZero’s approach is completely different: discarding the human rules in favour of deep neural networks and algorithms, it starts training for each game through deep reinforcement learning from a position of random play, with no built-in knowledge baring the basic rules of the game, in order to find a solution that will position itself as the strongest player in history for that game.
It begins the game with a random play approach, but learns from wins, losses and draws over time, and then adjusts the parameters of the neural network accordingly. In this way, it begins to choose more advantageous moves as it goes. According to DeepMind, AlphaZero needed just nine hours to learn chess.15
Garry Kasparov, former World Chess Champion, says, “I can’t disguise my satisfaction that it plays with a very dynamic style, much like my own!”
In the oil and gas industry, Royal Dutch Shell is focusing its investment efforts on the research and development of AI in a bid to find solutions to its need for cleaner power, for improved service station safety, and to keep abreast with the evolving energy market.16 It has already deployed reinforcement learning in its exploration and drilling endeavours to bring the high cost of gas extraction down, as well as improve each step of the oil and gas supply chain.
Shell is using deep-learning algorithms that are trained from historical drilling data, as well as data from simulations, to steer the gas drills as they move through a subsurface. The DRL technology also includes the mechanical data from the drill bit, such as pressure and bit temperature, as well as seismic survey data relevant to the subsurface. As a result, the human operator of the drilling machine has a better understanding of the environment they’re working in, which leads to quicker results, and less wear and tear – or damage – to expensive drilling machinery.
Daniel Jeavons, Shell’s general manager for Data Science, says, “The key thing is you’re giving the [AI] agent the autonomy to make the decision. But you’re providing input into the model, so you’re providing reward or penalty functions on the basis of what’s happening in the model, and how the model responds to the set of conditions that you give it.”17
In Chinese retail, deep reinforcement learning was used to improve the online retail environment of Taoboa – the online shopping website, owned by the Alibaba that is one of the largest e-commerce websites in the world.18 With over 600 million active users every month, implementing DRL in a live environment is not plausible, so a virtual replica of their online shopping environment was created in order to apply DRL in their quest to produce a better commodity search. The virtual Taoboa acted as a simulator that allowed for deep learning to take place from hundreds of millions of customers’ records and historical data. New policies were trained as a result that have significantly improved online performance.
According to Alibaba’s fiscal year 2018 report, Taobao strategy to redefine the shopping experience through intelligent computing produced significant increases in user engagement, sales conversions, and the number of active users.19 Combined with other content initiatives, they enjoyed a net increase from the previous quarter of 37 million mobile monthly active users (MAUs) to a total of 617 million mobile MAUs.
With deep reinforcement learning’s ability to solve complex problems heretofore unmanageable by machines, the potential applications thereof in sectors like medicine, robotics, smart grids, finance, and more, are vast. Considering artificial neural networking’s ability to process unstructured information and learn like a human brain, combined with the power of reinforcement learning, we are yet to see the full impact this technology has on all spheres of commerce and science.
- 1 Garchyl. (Apr, 2018). ‘Applications of reinforced learning in real world’. Retrieved from Towards Data Science.
- 2 Bajaj, P. (Nd). ‘Reinforcement learning’. Retrieved from Geeks for Geeks. Accessed 3 April 2019
- 3 Garchyl. (Apr, 2018). ‘Applications of reinforced learning in real world’. Retrieved from Towards Data Science.
- 4 Bajaj, P. (Nd). ‘Reinforcement learning’. Retrieved from Geeks for Geeks. Accessed 3 April 2019
- 5 Bajaj, P. (Nd). ‘Reinforcement learning’. Retrieved from Geeks for Geeks. Accessed 3 April 2019
- 6 Wong, R. (Oct, 2018). ‘Getting started with Markov Decision Processes: Reinforcement learning’. Retrieved from Towards Data Science.
- 7 Marr, B. (Oct, 2018). ‘What is deep learning AI? A simple guide with 8 practical examples’. Retrieved from Forbes.
- 8 Sharmi, U. (Nd). ‘Introduction to deep learning’. Retrieved from Geeks for Geeks. Accessed 3 May 2019
- 9 (May, 2018). ‘Data never sleeps’. Retrieved from Domo.
- 10 Marr, B. (Oct, 2018). ‘What is deep learning AI? A simple guide with 8 practical examples’. Retrieved from Forbes.
- 11 Hui, J. (Oct, 2018). ‘RL – Introduction to deep reinforcement learning’. Retrieved from Medium.
- 12 Silver, D. Et al. (Dec, 2018). ‘AlphaZero: Shedding new light on the grand games of chess, shogi and Go’. Retrieved from DeepMind.
- 13 (Nd). ‘Stockfish 10’. Retrieved from StockfishChess. Accessed 3 May 2019
- 14 (Nd). ‘Deep Blue’. Retrieved from IBM. Accessed 3 May 2019
- 15 Silver, D. Et al. (Dec, 2018). ‘AlphaZero: Shedding new light on the grand games of chess, shogi and Go’. Retrieved from DeepMind.
- 16 Marr, B. (Jan, 2019). ‘The incredible ways Shell uses Artificial Intelligence to help transform the oil and gas giant’. Retrieved from Forbes.
- 17 Marr, B. (Jan, 2019). ‘The incredible ways Shell uses Artificial Intelligence to help transform the oil and gas giant’. Retrieved from Forbes.
- 18 Shi, J. Et al. (May, 2018). ‘Virtual-Taobao: Virtualizing real-world online retail environment for reinforcement learning’. Retrieved from Arxiv.
- 19 (May, 2018). ‘Alibaba Group announces March quarter 2018 results and full fiscal year 2018 results’. Retrieved from Alibaba Group.