10 Nonlinear Time Series Analysis Using Bayesian Mixture of Experts 117 The complete gating function given in (10.8) needs to be considered. The predictive distribution is given by Ueda and Ghahramani [12] p.yNC1jxNC1; X/ M X iD1 gi .xNC1; g MAP/T 0 B B B @ yNC1 ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ Ow>i ŒxNC1 1 ; i i .1 ŒxNC1 1 >P 1 i ŒxNC1 1 / „ ƒ‚ … i ;2 i 1 C C C A (10.32) where Pi D ŒX 1 >Vi ŒX 1 C Eai ŒAi CŒxNC1 1 ŒxNC1 1 > , and T. ; †;df / is a df degree of freedom Student-t distribution having mean and scale parameter †. The relevant statistics for prediction are EŒyNC1 D M X iD1 gi .xNC1; g MAP/Ow>i ŒxNC1 1 (10.33) and VarŒyNC1 D M X iD1 gi .xNC1; g MAP/ i i i 1 (10.34) In (10.32), (10.33) and (10.34), gi .xNC1; g MAP/ refers to the calculation of the gates using (10.6) at the maximum a posteriori (MAP) estimates g MAP D f MAP; MAP; ƒMAPg obtained from the posterior distributions (10.20) and (10.21) respectively. The expert with the largest gate probability at every time instant nis chosen to represent the output. 10.4 Results This section demonstrates the proposed VBEM algorithm for MoE with ARX on a noisy nonlinear time series signal in two cases. The first case considers three signals of varying degrees of nonlinearity when the input, nu, and output, ny, lags are known. The second case analyses model selection when nu and ny are unknown. 10.4.1 Case 1: Knownnu and ny A discretised nonlinear Duffing oscillator is considered here, given by yn D1:97yn 1 0:98yn 2 5000y 3 n 1 C1e 6u n 1 Cen (10.35) where yn is the displacement of the spring and un is the excitation signal. en is a zero mean additive Gaussian noise, at sample time n, which propagates through the time series. The standard deviation of the noise was set to be 1% of the root mean square value of the signal. Equation (10.35) was simulated at three different levels of excitation: u UŒ 50;50 , u UŒ 100;100 and u UŒ 150;150 . UŒ h;h represents a uniform random signal with amplitude between hand h. As the excitation increases, the nonlinear effect increases, and so it is expected that the algorithm will pick more experts to model the data. The output y had to be scaled by a factor of 1,000 in order to avoid numerical/singularity issues. To obtain the original signal, the output of the model was just divided by this factor. For each scenario, 1,000 data points were used as the training data and the results are presented on a new independent data set consisting of 500 data points. In this case nu D 1 and ny D 2, and so xn D Œun 1;yn 1;yn 2 . One hundred iterations were used in the VBEM algorithm. Since the number of experts is not known, a range of expert numbers, M,were considered and for each 100 runs were performed so as to overcome the local maxima issue. Figure 10.1 shows the plots of
RkJQdWJsaXNoZXIy MTMzNzEzMQ==