Reinforcement Learning


The guideline "Self-learning production processes - introduction strategy for reinforcement learning in industrial practice" introduces the topic of machine learning and shows step by step how an introduction of industrial reinforcement learning can take place in your own company. Application examples illustrate the procedure.

Machine learning as a subarea of artificial intelligence encompasses a multitude of different concepts and methods. One of these is reinforcement learning, which is based on the principle of reward or punishment and means learning by trial and error. This method is particularly promising for the control of processes and is particularly suitable for learning intelligent control strategies. Reinforcement learning can help with processes that are too complex to map their behavior in a simulation. With the help of reinforcement learning, control strategies can be learned for very complex processes as well as for complex environmental conditions without having to model them explicitly. A further advantage is the possibility of determining a control strategy in real time, while this would be too computationally intensive with the help of a simulation. Reinforcement learning involves a change of perspective from plant to process control.


The introduction to Reinforcement Learning is an opportunity for independent learning of complex control strategies. As the introduction is complex, the guidance questions and toolboxes presented in the guide are intended to help identify a suitable pilot project. The key questions deal with the topics of process analysis (state space, action space, cost function), human resources (competencies, time expenditure) and material resources (machine learning hardware, sensor technology). The toolboxes provide support in answering the key questions, such as whether a specific industrial process is suitable for the reinforcement learning process.

With reinforcement learning, processes can be controlled for which modelling with conventional methods would be too complex.

Algorithmic approaches for self-learning production processes In addition,

the guide introduces algorithmic approaches for self-learning production processes and explains the necessary terms and concepts as well as the methods based on them. Reinforcement learning methods usually begin with a data set. For each time step of a training episode, this data record contains the current status, the action performed, and the corresponding evaluation. This data set can then be used in various ways to optimize the intelligent control strategy learned by the algorithm, the policy. Standard reinforcement learning methods calculate at least one of the following variables: a direct estimate of the current policy, an estimate of the so-called value function or an estimate of the system dynamics.

Integrating a reinforcement learning methodology into your own processes The applications for reinforcement learning in industry are manifold. The procedure for integrating such a methodology, however, in most cases follows a clearly defined scheme. This procedure can be divided into two phases, the planning phase and the implementation phase. In total, the integration can be divided into eight chronologically consecutive steps.


Best practice examples illustrate the methods used

Two application examples illustrate the procedure:

  • For the first example, a scientific demonstrator was set up at the Institut für Unternehmenskybernetik e.V., which learns an autonomous, force-regulated assembly process with the help of reinforcement learning.
  • The second application scenario was realized in the pilot plant of AZO GmbH Co. KG. In this scenario, a process engineering problem is addressed using the example of a pneumatic bulk material conveyor.

Loading video

About the research project InPulS 

The guideline was developed in connection with the project "InPulS - Intelligent and self-learning production processes". Within the scope of this project, a self-learning process control was developed using the example of a pneumatic bulk material conveyor and a force-controlled joining process with a robot arm. InPulS was developed as a pre-competitive research project of the VDMA-Forum Industrie 4.0 in cooperation with the Institut für Unternehmenskybernetik e. V. (Institute for Enterprise Cybernetics). (IfU) of the Cybernetics Lab of RWTH Aachen University and a VDMA industrial working group accompanying the project. The project was supported by Forschungskuratorium Maschinenbau (FKM) e.V. and VDMA in the period from 01 October 2017 to 30 September 2019.

Guideline and final report as download or print

The guideline "Self-learning production processes - Introduction strategy for reinforcement learning in industrial practice" is  available to download: German or English. The German print version is available from

The results of the project InPulS are described in the final report and are available for  download. In addition, a software for the two application scenarios with associated documentation was developed during the project and is available on THEMIS, the communication & knowledge platform for collaborative industrial research in mechanical engineering. VDMA members can register there free of charge. If you have any questions, please contact Judith Binzer,


Further information 

Institute for Enterprise Cybernetics e. V. (IfU) of the Cybernetics Lab at RWTH Aachen University: