52 K. Worden and E. J. Cross parameter estimation.2 A ‘model’ is defined above as a ‘mathematical construct’; often this means a mathematical equation, or system of equations. With this in mind, the problem of structure detection is simply that of determining the functional form of the equations which govern the model; they may be algebraic, differential, integral, etc. The equations will need to contain all of the variables which control or describe some aspect of reality (temperature, current, acceleration, etc.) and may well contain constants or parameters, some of which will be fixed by the physics and some will need to be determined. The latter parameters give rise to the problem of parameter estimation and will need to be inferred from measured data. As a concrete example (which will recur throughout the paper), one might consider a dynamical system of the form: .m¨y +c˙y +ky =x(t) (8.1) which is a simple linear mass-spring-damper system. In the usual frame of engineering dynamics, . x(t) would be a force (the stimulus), and .y(t) would be a displacement (the response); m, c and k would be constants for a given system (mass, damping and stiffness, respectively). The overdots in the equation denote differentiation with respect to time; thus the model structure here represents a second-order differential equation. If the structure and values of the constants are known a priori from physics, this is a white-box model; if the structure is known, but the parameters need to be inferred from measured data, this is a grey-bo x model. The problem of inferring model parameters from data is commonly called regression within the statistics and machine learning communities [2]. Few would argue that the mathematical basis for regression began with the method of leas t squares. This algorithm for fitting a model to data was first published by Legendre [3], although Gauss subsequently claimed precedence.3 Regardless of the dispute on precedence, Gauss certainly extended the algorithm in important ways; in particular, the Gaussia n or norma l probability distribution was introduced as a means of handling measurement errors in a rigorous fashion. In any case, both Legendre and Gauss developed and used the method in order to solve problems in celestial mechanics, which were highly nonlinear. The point here is that it would appear that the discipline of machine learning was originally motivated by problems in nonlinear dynamics. Given that the subject of ‘data-driven’ methods is vast, the ‘hard choices’ mentioned earlier come into play. Given the experience and preferences of the authors, this tutorial will concentrate on system identification. Furthermore, as this paper is a tutorial as opposed to original research, it will make no attempt at a comprehensive or balanced survey of the literature, but will rather lean on the papers by the authors and their colleagues with which they are most familiar. A more balanced viewpoint can be found by following the references in those papers. The identification of linear systems (LSI) is arguably so well developed now that it is comprehensively covered in textbooks [5, 6]. In contrast, nonlinear system identification (NLSI) is by no means as well developed [7]. Part of the problem with NLSI is that the structure detection problem is much more complicated. For a general linear differential equation model, the only freedom in the model structure is in the number of derivatives of the variables; furthermore, the model parameters only appear as multipliers of terms, there are no parameters ‘hidden’ in nonlinear functions, like the a in . e ay. Such parameters require more sophisticated means of estimation, whereas the linear-in-the-parameters problems can sometimes yield to algorithms as ‘simple’ as basic least squares. An NLSI model structure could in principle contain an y nonlinear functions of the variables of interest and their derivatives and any parameters. The complexities of NLSI have meant that, in the past, there has been no single algorithm which can address the completely general problem; the NLSI practitioner has instead relied on a ‘toolbox’ philosophy, with different approaches used for different classes of problem. This situation persisted until recently, when two methodologies emerged in the NLSI community, each offering the prospect of a general framework. The first group of algorithms, based on evolutionary optimisation, starting with the genetic algorithm [8], can handle difficult technical problems in NLSI, like nonlinearity in the model parameters and the existence of unmeasured states [9, 10]. The second group exploited Bayesian Inference. Although this idea originated over 20 years ago [11, 12], it really flourished in the last decade or so, when dynamicists began to take advantage of concepts from machine learning [13]. Bayesian methods overcome the same technical problems as evolutionary methods [14] and offer additional benefits; they allow model selection simultaneously with parameter estimation [15], can estimate parameter distributions and can propagate uncertainty in a principled manner. Because of its power and generality, this tutorial will mainly focus on Bayesian methods. Having said that the focus here will be on NLSI, it is important to note that there are many other problems in nonlinear dynamics for which a data-driven approach has been adopted. In fact, data-driven methods have proved vital in the development of the subject for a fundamental and important reason. Problems in nonlinear dynamics almost never have 2 The first author first learned these terms from the seminal work of Steve Billings; however, they may well predate that work. 3 Legendre published in 1805, Gauss in 1809; however, Gauss claimed to have had the method since 1793. The whole story is discussed in [4].
RkJQdWJsaXNoZXIy MTMzNzEzMQ==