In this work, two model-based reinforcement learning (RL) control strategies are investigated namely a multi-agent RL and RL-based adaptive PID control. An off-policy deep deterministic policy gradient (DDPG) was adopted in both cases to achieve optimal trajectory tracking control of crystallization processes. Two case studies were considered validate the new control strategies. The first is the cooling and antisolvent crystallization of aspirin in a mixture of ethanol and water, and the second is a 2-dimensional (2D) cooling crystallization of potassium dihydrogen phosphate in water. The optimal reference trajectories were identified using model-based dynamic optimization approaches which aim at maximizing the mean crystal size/minimizing the aspect ratio. Transfer Learning (TL) techniques and various reward-shaping strategies were also investigated to enhance the learning capabilities of the RL control. The results indicate that multi-agent RL saves massive training costs, compared to single agent, and RL-based adaptive PID exhibits excellent performance against state-of-the-art MPC.
Funding
ARTICULAR: ARtificial inTelligence for Integrated ICT-enabled pharmaceUticaL mAnufactuRing
Engineering and Physical Sciences Research Council