5 On Partitioning of an SHM Problem and Parallels with Transfer Learning 47 Table 5.3 Confusion Matrix of neural network classifier trained on the first dataset, test set, total accuracy: 100% Predicted panel 3 6 Missing panel 3 66 0 Missing panel 6 0 66 Fig. 5.5 Training and validation loss histories and the point of early stopping (red arrow) Table 5.4 Confusion Matrix of neural network classifier trained on the original data of the second dataset, test set, total accuracy: 97.72% Predicted panel 3 6 Missing panel 3 65 1 Missing panel 6 2 64 to a classification problem. The prediction of the network, concerning which damage class the sample belongs to, is the index that maximises the output vector y; the outputs are interpreted as the a posteriori probabilities of class membership, so this leads to a Bayesian decision rule. Loosely speaking, one can think of the transformation between the hidden and output layers as the actual classifier, and the transformation between the input layer into the hidden layer as a map to latent states in which the classes are more easily separable. In the context of deep networks, the hope is that the earlier layers carry out an automated feature extraction which facilitates an eventual classifier. In the deep context, transfer between problems is carried out by simply copying the ‘feature extraction’ layers directly into the new network, and only training the later classification layers. The simple idea explored here, is whether that strategy helps in the much more shallow learner considered in this study. The transfer is accomplished by copying the weights W1 andbiases b1 from sub-problem one directly into the network for sub-problem two, and only training the weights W2 and biases b2. As before, multiple neural networks were trained on the first dataset. In a transfer learning scheme, it is even more important that models should not be overtrained, since that will make the model too case-specific and it would be unlikely for it to carry knowledge to other problems. To achieve this for the current problem, an early stopping strategy was followed. Models were trained until a point were the value of the loss function decreases less than a percentage of the current value. An example of this can be seen in Fig. 5.5 where instead of training the neural network for 1000 epochs, training stops at the point indicated with the red arrow. After multiple networks were trained following the early stopping scheme above, the network with the lowest value on validation loss was determined and the transfer learning scheme was applied to the second problem. The nonlinear transformation given by the transition from the input layer to the hidden layer was applied on the data of the second dataset. Consequently, another neural network was trained on the transformed data, having only one input layer and one output/decision layer. To comment on the effect of the transformation, another two-layer network was trained on the original second dataset and the results were compared. The confusion matrices of the two neural networks on the testing data are given in Tables 5.4 and 5.5; the misclassification rates are very similar. However, it is interesting to also look at the effect of the transfer on the convergence rate of the network trained on the transferred data and also to illustrate the feature transformation on the first and the second datasets.
RkJQdWJsaXNoZXIy MTMzNzEzMQ==