Model Validation and Uncertainty Quantification, Volume 3

316 G. Cai and S. Mahadevan parameters. For example, Roux & Bouchard [7] calibrated a ductile damage model using measurements from the full displacement field. It can be observed that the main reason that researchers choose not to use full field observations to do parameter calibration is due to computational cost. However, the price people must pay is the loss of information and accuracy, since such a strategy implies that the model parameters do not vary over space and time. For the general, heterogeneous case where model parameters vary over space and time (e.g., material properties), full-field calibration would be high dimensional. Since calibration using full field observations is time consuming, parallel and distributed computing can help to reduce the time cost of data analytics, without causing any accuracy loss. A parallel computing approach in the context of big data is MapReduce. Utilizing a cluster of nodes, MapReduce performs two essential functions—it parcels out work to various nodes within the cluster or map, and it organizes and reduces the results from each node into a cohesive answer to a query [8]. To the authors’ knowledge, only a few studies were found on the MapReduce application on model calibrations can be found. Humphrey et al. [9] parallelized calibration of parameters in watershed models, which was realized on Windows Azure cloud computing platform. Zhang et al. [10] realized cloud-based calibration of hydrologic model on Hadoop platform. The related researches only parallelized calibration process to certain application (hydrological model), also they did not handle large volume of observations. In this paper, a novel application of MapReduce on model calibration is presented. Here we focus on handling big data issue in model calibration. It is known that numerical models are sometimes too expensive to be repeatedly run in calibration process, which calls for the construction and use of surrogate models. The training point collection and the training of the surrogate model are also parallelized in this paper. The proposed methodology is general, and applies to variations over both space and time. The rest of this paper is organized as follows. Section 31.2 provides a background review of the basic concepts related to model calibration and big data analytics. Section 31.3 develops the big data analytics approach for the calibration process. Section 31.4 implements the proposed approach using an illustrative example of thermal conductivity calibration of a concrete slab with holes using observations from infrared thermography, and discusses the performance of the MapReduce methodology. Section 31.5 provides concluding remarks. 31.2 Background This section provides a review of basic concepts of model calibration, surrogate model, and big data analytics. 31.2.1 Bayesian Calibration As we explained before, model calibration refers to the adjustment of model parameters so that the model output matches well with the field data. In practice, Bayesian calibration is popular due to its robustness. Here we will use Bayesian inference as an example technique of calibration. Bayes’ rule describes the parameter update process: f 00 . / D L. /f 0 . / R L. /f 0 . /d (31.1) Here represents the uncertain parameters of interest, Y represents the observations. Like the observations are continuous as well as the design parameters. Alsof 0 ( ) is the prior density, L( ) is the likelihood function, f00( ) is the posterior density, and RL( )f 0 ( )d is the evidence. Several numerical techniques are available to construct the posterior distribution, such as Markov Chain Monte Carlo simulation, and particle filter. The latter is used in this paper. In the particle filter approach, the weights of the particles (samples) define the probability distribution; these weights are updated with observations using the Bayes’ rule above, to get the samples that correspond to the posterior distribution. 31.2.2 Gaussian Process (GP) Surrogate Model Since Bayesian updating requires repeated runs of computer model, surrogate model is preferred to reduce the computational cost. Gaussian process surrogate model is chosen for this purpose [11]. Many types of surrogate modeling techniques are available in the literature, such as polynomial regression radial basis functions, Kriging (Gaussian process) and neural networks. The Gaussian process surrogate model is used in this paper for the sake of illustration.

RkJQdWJsaXNoZXIy MTMzNzEzMQ==