Loughborough University
Thesis.pdf (33.14 MB)

Dynamic balance of bipedal robot based on central pattern generators and deep reinforcement learning

Download (33.14 MB)
posted on 2021-07-05, 09:22 authored by Christos Kouppas
Legged robots have been researched for more than half a century. However, commercially only a handful of robots are available with SPOT from Boston Dynamics being the most recent and advanced quadruped robot. For bipedal robots, they are even fewer commercial examples as they are, generally, more unstable. One of the most recent bipedal robots, despite that is 12 years old, is NAO robot with a height less than 60cm. Human-size bipedal robots are not commercially available as their size, and the level of instability makes them unsafe for public use.

Herein, a bipedal robot is developed and trained to be Safe, Agile, Robust, Autonomous Host (SARAH). The robot was focused on efficiency because autonomy is limited by the amount of power that a robot can carry with it. To achieve efficiency, it utilised patent-pending actuators that could lock while they were not in use. In addition, the controlling algorithm was divided in two parts, one with high-power and one with low-power needs. The low-power controller was deployed in three microcontrollers and was running with a speed of 100Hz. The high-power controller was deployed in two low-power single-board computers which were executing the control sequence every second.

Through this project, there were several contributions to knowledge in robotics, both mechanically and computationally. The proposed robot had a unique walking gait that was inspired by the walking gait of ostriches to maximise its efficiency. Also, the asynchronous controller offered advanced learning capabilities without compromising reliability by using neural networks and fully defined pattern generators. SARAH was able to learn through experience both before and after deployment, which makes it ideal for commercial use, as it could learn based on its introduction within an environment.

During the project, the first prototype of SARAH was demonstrated in several exhibitions in the UK. The second prototype was designed and used in a dynamic simulator for training and evaluation. In here, an extension of the Normalised Advantage Function agent for Reinforcement Learning is proposed. The agent was improved so it can accommodate recurrent layers as first layers and enabled the extraction of features based on the dynamics of the system and not the dynamics of the network, which is important for real-world, dynamic systems. Additionally, a different memory was introduced to allow the experiences to have continuity when they are called in memory while having randomness to improve learning exploration.

The robot demonstrated recovery after it was pushed for half a second with a random force on its torso. The recovery in evaluation experiments took, in average, less than 3 seconds and the robot made 4 - 6 steps to recover. Also, the final position of SARAH was, in average, ± 3 cm from its initial position which is really good as the robot would not need extra floor space for recovery. Finally, the evaluation experiments were evaluated statistically to demonstrate that the results had small deviation, except 2 out of 100 experiments from which the robot moved 17 - 18 cm to the right.


EPSRC Centre for Doctoral Training in Embedded Intelligence

Engineering and Physical Sciences Research Council

Find out more...



  • Science


  • Computer Science


Loughborough University

Rights holder

© Christos Kouppas

Publication date



A thesis submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.


  • en


Qinggang Meng ; Mark King; Dennis Majoe

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate