Special Topics in Structural Dynamics & Experimental Techniques, Volume 5

78 L. A. Bull et al. In an engineering context, comprehensive labels to describe the observations are rarely available. This fact often forces a dependence on unsupervised techniques in practical applications, specifically, novelty detection. An alternative approach, however, is to apply semi-supervised pattern recognition [2]; these algorithms make use of both labelled data, L, and unlabelled data, U, such that dataset used by the algorithm is D=L∪U. Active learning is a variation of semi-supervised learning (or, more generally, partially-supervised learning [3]). As with semi-supervised learning, active algorithms will make use of both Land U; however, an active learner will query/annotate unlabelled data in Uto automatically extend the labelled dataset (L), in an intelligent and adaptive manner. In the context of data-based SHM, a pattern-recognition model that is semi-supervised and active can bring several advantages [4]. Most significantly, these algorithms make use of limited labelled data, while requesting further annotations for only the most informative observations; this can significantly reduce the cost associated with investigating abnormal data records from engineering structures. Furthermore, these algorithms can utilise the information in the unlabelled data to improve the diagnostic capabilities of the SHM system. Additionally, active algorithms can be applied offline to a large pool of collected data [5], or online, to drifting data streams [6]. In the online setting, if an algorithm can adapt and update, while only requesting critical labels, this is extremely significant to data-based SHM. 12.3 A Probabilistic Model for Guided Sampling A probabilistic approach is suggested as the foundation for an active framework with engineering data. The measured data, x, are assumed to be sampled from a parametric mixture model; specifically, a mixture of KGaussian distributions. An small initial sample of labelled data, L, are used to establish the initial number of classes, K, and calculate the Bayes-optimal estimates for the model parameters. Briefly, the data labels, yi, are used to group the measurements, xi, according to class, and then calculate the posterior distribution of the parameters that define the categorical distribution, (π1, . . . ,πK); a Dirichlet prior is applied. Similarly, the observations xi are used to calculate the posterior distribution of the parameter estimates for the Gaussian-distributed features for each class (μy,d,σ 2 y,d); in this case, a normal-inverse-chi-squared (hierarchical) prior is used. As the prior distributions are conjugate, the solutions for the posterior distributions over the parameters are tractable; therefore, theposterior predictive distributions can be found analytically [7]. The posterior predictive distributions are p(x|y, L) for the Gaussian-distributed observations, andp(y|L) for the Dirichlet-distributed labels. The dependencies of this framework are shown by the graphical model in Fig. 12.1, including any hyperparameters; further details can be found in the references, and the theoretical paper by the authors. A generative classifier can then be defined using Bayes rule, for the prediction of the label distribution for the unlabelled data in U, p(y| ˆx, L) = p(ˆx|y, L) p(y|L) p(ˆx|L) , (12.3) which assumes independence between each dimension (feature) inX(i.e. naive Bayes), such that p(ˆxi|y, L) = D - d=1 p(ˆxi,d|y, L). (12.4) Note, the posterior distribution over the labels, p(y| ˆx, L), is a predictive likelihood for each class, y ∈ {1, . . . ,K}. Specifically, the estimate of the probability for each class is combined to give a categorical distribution (when normalised) over the label space. This is not the full posterior, which cannot be found analytically, thus, the classification is not fully Bayesian. The full posterior can be approximated via sampling algorithms (Gibbs sampling); however, this approach is unnecessary, as the full distribution is not required for this example. Various probabilistic measures can be used to dictate which of the measurements in Uare the most informative when labelled. These observations can be queried, and the cause can be investigated by the engineer/oracle to provide descriptive data labels, y∗. Following the investigation and labelling of data, L now includes the queried observations, therefore, the model is retrained and new data are queried; this process can then iterate until a label budget is reached, or applied sequentially to streaming data (online).

RkJQdWJsaXNoZXIy MTMzNzEzMQ==