Development of DRL control agents for basic magnetic control problems
Data-driven approaches represent an alternative to achieve the required level of robustness. A possibility is to exploit the capability of RL algorithms to learn from data and obtain agents that solve the three basic problems, being able of robustly dealing with the different plasma operating conditions. RL is a framework that allows to solve control problems by making an agent (the controller) interact with the environment (the plant), via a trial and error strategy, until an optimal control strategy is reached. A main advantage of this technique is that the control goals can be specified in terms of a scalar reward function. The agent is not told which actions to take, but it decides what to do based on the observations coming from the environment, and receives a reward expressing how well it has performed. The aim of the training procedure is to maximize the cumulative long-term reward, i.e. the sum of the rewards on the long run. The members of the TRAINER teams have already produced some preliminary results by applying a tabular approach, based on the Q-learning algorithm, to derive an agent that solves problem. However, such an approach has several limitations when dealing with continuous state spaces, as in the case of plasma magnetic control. Therefore, during the TRAINER project the applicability of Deep RL will be investigated, which leverages the ability of Deep Neural Networks (DNN) to serve as universal function approximators to achieve improved control performance. In particular it is envisaged to use the Deep Deterministic Policy Gradient (DDPG) algorithm. DDPG is a relatively simple Policy Gradient actor-critic algorithm based on the use of DNN, which has been chosen for the purposes due to its sample efficiency and the small number of hyperparameters involved, which makes the tuning procedure more straightforward when compared to more sophisticated RL techniques. Indeed, DRL algorithms usually have a quite large number of free parameters (the structure of the neural networks, the learning rates, the soft update policy in case of twin neural networks, and so on) whose effect on the final result is not always obvious or immediately interpretable.
The training of the agents will be first carried out exploiting linear models of the response of the plasma, while the assessment and refinement of such agents will be carried out by using the fast nonlinear equilibrium code delivered by WP0. Being the fast version of CREATE-NL available in the Simulink environment, the DDPG agents will be implemented by using the Mathworks Reinforcement Learning Toolbox. Moreover, this WP will focus mainly on the control problems, given the possibility of reducing the former to a Single-Input-Single-Output (SISO) control problem, and being the latter a two inputs one output control problem. Depending on the achieved results, the project team will decide to extend the approach to develop a fully data-driven agent for plasma boundary control, which is a Multi-Input-Multi-Output problem.
This WP includes the following tasks, being [T1.3] optional:
[T1.1] – Development of a first version of Deep Deterministic Policy Gradient (DDPG) agents for the basic magnetic control problems (i.e. Plasma Current Control and Vertical Stabilization) by exploiting both single linear models and parameter varying linear model of the plasma response to model the environment
[T1.2] – Refinement of the agents developed by [T1.1], by exploiting the fast nonlinear equilibrium code
[T1.3] – Assessment of the possibility of developing a DDPG agent to solve problem the Plasma Shape Control problem
and the following deliverables are expected as outcomes of [WP1]:
[D4] – Set of control agents for basic plasma magnetic control problems
[D5] - Strategy to tune the DDPG hyper-parameters for the training of plasma magnetic control agent