Structural Health Monitoring & Machine Learning, Vol. 12

A Robust Data-Driven Algorithm for Early Damage Detection in Structural Health Monitoring 45 equally to the process, which is not biased by different data magnitude. xn = zn −µn σn (1) 2. Covariance Matrix Computation. Starting from the standardized data collected in the Xmatrix, a covariance matrix is evaluated, as in C= XTX N , to quantify the degree of correlation between the variables and how the co-vary with each other to identify the direction of maximum variance in the dataset. 3. Covariance Matrix Eigenproblem. This is the key step of the PCA algorithm, and it consists of an eigenvalue decomposition of the covariance matrix, as expressed inCv=λv. It is performed in order to find the eigenvectors (v), or PCs, and their corresponding eigenvalues (λ). The eigenvectors represent the directions in which the data varies the most, and the corresponding eigenvalues indicate the amount of variance captured by each component. Each eigenvector corresponds to a PC, and the eigenvalues are ranked in descending order to prioritize the directions that capture the most variance. Since the eigenvalues are often referred to as active energies [12], representing the proportion of total variance explained by each component, PCs with larger eigenvalues capture more variance and are, therefore, more energetic and relevant to the data. 4. Dimensionality Reduction. Subsequently, which components to keep and to discard are selected. This selection process is guided by a threshold, typically based on the proportion of variance to retain or the mean of the eigenvalues, as in Eq. (2) [19]. λi > 1 Ns NsX j=1 λj. (2) where λi is the eigenvalue of the i −th component. Components with eigenvalues that exceed the mean eigenvalue are traditionally considered significant, while the others are the so-called Minor Components (MCs), which are the ones that exhibit less variability. 5. Projection onto Principal Components. Once the PCs are identified, the data can be projected into the new subspace, effectively reducing its dimensionality and making them more manageable for analysis or further processing.. The transformed data matrix Yis obtained as in Y = XVk, where Vk is the matrix of the previously selected k eigenvectors. Thanks to this framework, PCA enhances the interpretability of the data, making it a suitable tool in data analysis for feature extraction, revealing relationships, and patterns that may not be immediately apparent in the original variable space. Additionally, PCA can improve the performance of ML algorithms by eliminating noise and redundant features. Despite its strengths, PCA assumes that the components are linear combinations of the original variables. As a result, it may not capture complex nonlinear relationships present in the data. Additionally, while PCA is effective for variancebased feature selection, it does not explicitly consider the underlying distribution of the data, which may lead to the loss of important information, especially in cases with significant non-Gaussian characteristics. Independent Component Analysis Independent Component Analysis (ICA) is a computational technique that seeks to decompose multivariate data into statistically independent non-Gaussian components [13]. While similar to other dimensionality reduction techniques like PCA and factor analysis, ICA goes a step further by aiming to extract signals that are not only uncorrelated but also independent, meaning that the presence of one signal provides no information about the presence of the others. Thanks to this property, ICA is particularly useful in applications involving the analysis of mixed signals, such as in BSS, where the goal is to recover original source signals from observed mixtures without prior knowledge of the mixing process. One of the most classic examples is the cocktail party problem, in which multiple people are speaking simultaneously in a room, and ICA is used to isolate each individual voice from a set of mixed recordings captured by several microphones. Assuming that the observed data are linear mixtures of statistically independent and non-Gaussian source signals, ICA aims to recover these unknown sources by finding components that are as far from Gaussian as possible, using measures like kurtosis (a measure of the ”tailedness” of the distribution) or negentropy (a measure of how far a distribution is from Gaussianity). To illustrate the underlying principles of ICA, consider the following mathematical formulation. Let s = s1,s2, . . . ,sNs T represent the unknown source signals andx= x1,x2, . . . ,xNs T represent the observed mixed signals. The mixing process can be expressed as in Eq. (3) [13], where Ais the unknown mixing matrix. x=As (3)

RkJQdWJsaXNoZXIy MTMzNzEzMQ==