Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Fusion of thermal and RGB images for automated deep learning based crack detection in civil infrastructure

Fusion of thermal and RGB images for automated deep learning based crack detection in civil... Research has been continually growing toward the development of image-based structural health monitoring tools that can leverage deep learning models to automate damage detection in civil infrastructure. However, these tools are typically based on RGB images, which work well under ideal lighting conditions, but often have degrading performance in poor and low-light scenes. On the other hand, thermal images, while lacking in crispness of details, do not show the same degradation of performance in changing lighting conditions. The potential to enhance automated damage detection by fusing RGB and thermal images together within a deep learning network has yet to be explored. In this paper, RGB and thermal images are fused in a ResNET-based semantic segmentation model for vision-based inspections. A convolutional neural network is then employed to automatically identify damage defects in concrete. The model uses a thermal and RGB encoder to combine the features detected from both spectrums to improve its performance of the model, and a single decoder to predict the classes. The results suggest that this RGB-thermal fusion network outperforms the RGB-only network in the detection of cracks using the Intersection Over Union (IOU) performance metric. The RGB-thermal fusion model not only detected damage at a higher performance rate, but it also performed much better in differentiating the types of damage. Keywords: Infrared thermography, Structural health monitoring, Image fusion, Automated crack detection 1 Introduction asset management as an effective tool for managing Infrastructure in the United States is in a growing state capital assets across various types of infrastructure to of disrepair, as the annual needs for repair/replacement minimize the total cost of maintenance and operation funding continue to outpace the spending. The Ameri - in a dynamic and data-rich environment. Furthermore, can Society of Civil Engineers (ASCE) estimated that it points out that the key to success of this proactive the gap in infrastructure funding reached $2 trillion dol- approach lies in the significant advances in the monitor - lars between 2016 and 2025 (ASCE, 2020). To combat ing technologies, to prioritize needs and plan long-term this funding gap, infrastructure managers are moving strategies. As automated structural health monitoring towards an asset management-based model to maximize (SHM) tools progress, their potential impact on asset the impact of the limited funding. ASCE (2020) describes management practices will continue to grow. SHM tools employing deep learning have shown their efficacy in providing powerful data processing Vedhus Hoskere, Yasutaka Narazaki and Andrew Maxwell contributed equally approaches for damage detection and structural condi- to this work. tion assessments for a variety of infrastructure types (Bao *Correspondence: quincy.g.alexander@erdc.dren.mil & Li, 2020; Hess et  al., 2015; Ye et  al., 2019). Computer US Army Engineer Research and Development Center, 3909 Halls Ferry Rd, vision-based deep learning techniques, in particular, Vicksburg, MS 39180, USA have seen a tremendous growth as computational power Full list of author information is available at the end of the article © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Alexander et al. AI in Civil Engineering (2022) 1:3 Page 2 of 10 ASTM International, 2013). Passive IRT, specifically, is commonly used in the NDE of large civil infrastructure, where artificially heating the specimen is not always fea - sible (Avdelidis & Moropoulou, 2004). For passive IRT, solar radiation serves as the heat source, and studies have been performed to understand the optimum times to capture images with the highest thermal contrast and that diurnal temperature variations can be adequate to support the use of IRT to detect defects in fully shaded regions (Alexander et al., 2019; Washer et al., 2013). Visible (RGB) and thermal images each have their own Fig. 1 Relative cost per pixel, through time strengths and weaknesses. For example, visible images can provide a clear representation of the scene at high resolutions and low cost per pixel, but light sources can affect the image quality. In contrast, thermal images are continues to increase and the cost per pixel of data con- much more robust to variations in lighting conditions tinues to decrease (Koch et  al., 2015; Long et  al., 2015). and can provide sub-surface information, but generally As noted by Barry Hendy of Kodak Digital Camera Tech- suffer from comparatively low resolution and weak crisp - nology has developed at an exponential improvement ness. Several manufactures have produced cameras that rate similar to Moore’s law for microprocessors, with can capture both visible and thermal images simultane- the pixels per dollar doubling annually when comparing ously, thus providing more contextual information for the cameras of similar feature level (Brynjolfsson & McAfee, thermal images. 2014; Commons, 2015; Lucas, 2012). This trend across a Research has been performed in various fields to deter - range of camera types is illustrated in Fig. 1. mine how the two image types (RGB and thermal) can Leveraging the reduced technology cost and the be combined to enhance their respective advantages, increased computational power, there has been an while neutralizing their disadvantages. For example, exponential increase in the creation and deployment of the automotive industry is particularly interested in the computer vision-based inspection and infrastructure fusion of RGB and thermal images for object detection monitoring methods. For example, Hoskere et  al. (2020) in autonomous vehicles, for which low light conditions proposed the use of semantic segmentation for vision- and strong glares are common challenges (Sun et  al., based structural inspection, where a fully convolutional 2019). Liu et al. (2016) showed an 11% improvement over neural network was used to automatically identify multi- the baseline in pedestrian detection by using their pro- ple damage defects, like cracks, corrosion, spalling, etc., posed fusion model. Shivakumar et  al. (2019) proposed on certain structures of interest to the U.S. Army Corps an RGB-thermal fusion technique in the DARPA Subter- of Engineers, such as lock gates and bridges. A detailed ranean Challenge to identify four different object classes review of the current research, key challenges, and ongo- (person, drill, backpack, and fire-extinguisher). An et  al. ing work related to computer vision-based tools for the (2018) used an image matching technique for crack inspection of infrastructure can be found by Spencer detection, which can compare areas on thermal and vis- et  al. (2019). Dong and Catbas (2020), and Avci et  al. ible spectrum images to determine to identify matching (2021) also provided a comprehensive summary of com- cracks, thus reducing the false positives. For each of the puter vision-based SHM tools and techniques at both referenced applications, the fusion of thermal and RGB local and global levels, including the factors that can images performed better than adoption of only RGB affect accuracies. images. Infrared thermography (IRT) is the science of detect- While the existing researchers have previously inves- ing infrared energy emitted by an object, converting it tigated the use of both visible and thermal images for into an apparent temperature, and displaying the result damage detection in civil infrastructure, those studies are as an image (Fluke, 2021). Infrared (IR) wavelengths typically performed on simple laboratory test specimens are longer than those found in the visible spectrum and that lacked the type of visual complexity that is present in are not visible with the human eye. IRT can often pro- the field. In addition, these studies have leveraged active vide structural anomaly information that is not identifi - IRT, which requires an artificial heating element to high - able in the visible spectrum; indeed, the power of IRT for light the flaws for detection. Moreover, they are based on non-destructive evaluation (NDE) has been well docu- object detection, simply drawing a box around the identi- mented (Avdelidis & Moropoulou, 2004; Hess et al., 2015; fied cracks without considering the pixel-level accuracy A lexander et al. AI in Civil Engineering (2022) 1:3 Page 3 of 10 of the prediction, which is nevertheless crucial for dam- study about the reliability of the unit revealed that the age severity assessment. camera’s performance was found to be within the speci- This paper proposes to fuse features from visible and fication for temperatures ranging from 0 °F to 120 °F thermal spectrum images to develop a robust automated (Alexander & Lunderman, 2021). damage detection strategy for in-service civil infrastruc- The RGB-thermal pairs of images are annotated for ture. The novelty of the research effort is the quantifica - semantic segmentation as part of this research effort. tion of the benefits of fusing thermal features within the Annotating images for semantic segmentation is an ardu- neural network for a semantic segmentation model, with ous task, as each pixel in the image must be assigned a class predictions determined per pixel. To achieve the label. While the annotation of regular shapes, like poly- goal, a curated dataset of both damaged and undamaged gons, can be completed with a few clicks, such amor- in-service infrastructure is developed, with each visible- phous shapes as cracks require meticulous attention to spectrum image having a corresponding thermal image details. To develop a high-quality dataset for this study, as well as hand-labelled ground truth images for seman- annotations are conducted using InstaDam (Hoskere tic segmentation. The thermal images are collected in et  al., 2021), an open-source software for damage anno- the field using passive IRT technology, with a low-cost tation developed by the University of Illinois at Urbana- thermal imager that connects to a mobile device. This set Champaign. The dataset collected and used for training, of hardware better aligns with the tools that would com- validating and testing for this effort contains images of monly be utilized by inspectors in the field. Additionally, in-service infrastructure taken in the field. images of concrete joints are included in the dataset to represent crack-like visual patterns that are actually not 2.2 Image alignment associated with any damage. The model, therefore, not Fusion of images requires that the images are properly only has to detect the cracks, but also is able to differen - aligned. To match two images, Rao et al. (2014) outlined tiate between visually similar classes. The performance a normalized cross-correlation (NCC) approach, which is then measured by the predictions of each pixel. This works well when there is good structure within the two research demonstrates that the fusion of RGB and ther- images. In this approach, one image is held fixed, while mal images improves the performance of the model over the position of the second image is moved pixel-by-pixel, the RGB only model in properly predicting cracks, as with the quality of the match calculated at each position. well as in differentiating between cracks and comparable The quality of the match is quantified through a coeffi - features. cient of correlation, a value ranging from 0 to 1, where 1 indicates a perfect match. The position with the highest correlation coefficient should provide the best alignment 2 Methodologies between the two images. The dataset collected and used for training, validat - As shown in Fig. 2, the native resolution of the thermal ing and testing for this effort contains images of certain images is lower than that of the visible image; therefore, in-service infrastructure taken in the field. This section the thermal image should be scaled up to align the ther- describes the curation of the dataset, with aligned RGB mal scenes with the corresponding visible image. The and thermal images of the damaged infrastructure and thermal image is appropriately scaled, and then the NCC corresponding labelled ground truths for use in a seman- method is applied to the scaled thermal image to locate tic segmentation algorithm. the position of the maximum correlation coefficient. The image size is made consistent with the RGB image by 2.1 Data collection zero-padding. To qualitatively validate the accuracy of The FLIR One Pro Gen 3 (Flir One) thermal camera was this approach, the two images are blended. Fig.  3 shows: selected to collect data, due to its balance between a low (a) the RGB image, (b) the padded thermal image, and (c) price point and relatively high thermal resolution (Alex- the RGB-padded thermal blended images for one exam- ander & Lunderman, 2021). The FLIR One unit can cap - ple scene. Finally, this approach is applied to all the image ture thermal and RGB images simultaneously. When this pairs in the dataset, using the iron palette (e.g., Fig.  3c) camera is connected to a mobile device, the FLIR One and greyscale palette used for the thermal images. How- mobile app is used as a viewfinder and to control the ever, the NCC approach is not effective for all image pairs operations of the camera. Thermal images are captured and specifically works poorly for the images with low at a resolution of 160 × 120 pixels for storage on the thermal definition. Therefore, some images have to be device and then can be decompressed to a resolution of manually realigned. 640 × 480 pixels when uploaded to a personal computer by using the FLIR software. The visible spectrum images have a resolution of 1440 × 1080 pixels. A preliminary Alexander et al. AI in Civil Engineering (2022) 1:3 Page 4 of 10 Table 1 Class label overview and weight Class Description Pixels Class probability Weight 0 No label 242,470,928 0.9868 1.4357 1 Crack 2,252,136 0.0092 34.7848 2 Joint 998,536 0.0041 42.0544 3 Spalling 1,360,938 0.0055 39.654 4 Vegetation 2,678,104 0.0055 39.6544 2.4 Network architecture The RTFNet network proposed by Sun et  al. (2019) is used as the foundation for analysis in this study. The RTFNet architecture consists of an RGB encoder, a par- allel thermal encoder, and a single decoder followed by the pixel-wise classification prediction. The encoder pro - duces low-resolution feature maps for the RGB image and the thermal image, and the decoder up-samples the features to develop dense feature maps (Yasrab et  al., 2017). The features acquired from each layer within the Fig. 2 Illustration of visible and thermal image pair alignment thermal decoder are mapped to the corresponding layer within the RGB encoder, as part of the fusion process. This network is illustrated in Fig.  4. The encoder is based 2.3 Da ta preparation and annotation on the Residual Network (ResNet) architecture, which Each image in the dataset has its corresponding pixel- has certain variants based on the number of layers. The wise annotations. Seventy-five of the images are selected ResNet-18 model with 18 neural network layers is used with at least one crack and well-defined thermal contrast. in this study. The frequency of occurrences of different labelled classes Within the network, the classes are weighted based on is provided in Table 1. While the focus of this study is on the pixel distribution, according to the class weighting crack detection, additional labels are included for spalling methodology outlined by Paszke et  al. (2016). The class and vegetation growth. All the images are cropped to the weighting formula is provided in Eq.  1. And the results size and position corresponding to the thermal images are shown in Table 1. by removing the padded borders. The datasets generated during the current study are available from the corre- sponding authors on reasonable requests. Fig. 3 a Visible image, b padded thermal image, and c blended image A lexander et al. AI in Civil Engineering (2022) 1:3 Page 5 of 10 Fig. 4 RGB-thermal fusion network architecture [Sun et al., 2019] (1) Fusing the RGB and thermal images (RGBT). The Weight = , (1) greyscale version of the thermal images is used in ln c + class_probability the analysis. (2) Fusing the RGB images with a blank image (RGBB). where c is the  Paszke Method Coefficient (1.02),  and This scenario represents the condition where andclass_probability indicates the Ratio of the pixels of only RGB data is available in an architecture that an individual class to the total number of pixels in the includes an empty (white) thermal input in the dataset. encoder. Image augmentation schemes are applied to improve (3) Removing the thermal encoder from the architec- the training results. First, RGB images are duplicated, ture and analysing the RGB images only (RGB). and the brightness of the matching images is reduced (4) Removing the RGB encoder from the architecture uniformly to simulate a low-light environment. The cor - and analysing the thermal images only (T). responding label images and thermal images are not modified. This augmentation method would double the size of the dataset, which is then randomly split by 80/10/10 for training/validating/testing, respectively. 3.1 Model performance evaluation When the model is run for training, further data aug- Both the RGBB and RGB models are included in the mentations is applied: random flip, random noise, ran - analysis to validate the process, as the RGB-blank dom brightness change, and random cropping. pair should perform similarly to the RGB-only model. The performance of these four scenarios is measured 3 Results in terms of Intersection over Union (IOU), which is The following four scenarios are studied as part of the one of the most common performance metrics used effort to quantify the value, including the thermal data: for semantic segmentation. At the pixel level, IOU Alexander et al. AI in Civil Engineering (2022) 1:3 Page 6 of 10 Fig. 5 Smoothed crack detection IOU rate for RGBT, RGBB, RGB and T datasets Table 2 IOU performance summary at 6000 epochs Table 3 Runtime comparison Scenario IOU crack Model Avg. detection seconds/ performance epoch RGBT 0.31 Dual encoder 23.5 RGB/RGBB 0.27 Single encoder 18.1 T 0.20 The results are shown in Fig.  5, and a summary of the indicates the ratio of the correct class predictions to the performance at 6000 epochs is provided in Table 2. The sum of the correct and incorrect class predictions, as results show that the fusion of RGB and thermal images shown in Eq.  2. The performance for each scenario is outperforms RGB-only and T-only models, indicat- shown in Fig. 5, after applying a Locally Weighted Scat- ing that the network is able to leverage the additional terplot Smoothing (LOWESS) regression technique information provided by the thermal images. By 6000 applied. epochs, the performance of each scenario becomes stabilized relative to each other. The RGBT model Area of overlap TP outperforms the RGB-only model by approximately IOU = = , (2) Area of union TP + FP + FN 15%. The RGBB network was trained to ensure that any performance improvement of the RGBT network over the RGB network was due to the additional infor- where TP is the true positive (pixels), FP the false posi- mation from the thermal images and not due to addi- tive (pixels), and FN the false negative (pixels). tional parameters in the network. The RGBB and RGB A lexander et al. AI in Civil Engineering (2022) 1:3 Page 7 of 10 Fig. 6 Sample Inputs and comparison between RGB, RGBT and T predictions accuracies align well with each other, signifying that the 4 Further discussion dual encoder systems with the second blank image per- To illustrate the overall performance of the model, three form similarly to an RGB-only image as expected. The specific conditions were evaluated. The first condition results also show the ability of thermal images alone to (Sample 1) displays a complex mix of classes; the second provide an indication of cracks at approximately 74% of condition (Sample 2) represents certain visually similar the rate of the RGB-only model, and 66% of the rate of materials with different thermal characteristics; and the the RGBT model. Thus, damage can be indicated in a third condition (Sample 3) represents low light condi- scene with reduced impact of the lighting conditions, tions. These three conditions are highlighted in Fig.  7, which supports the original hypothesis. with their performance compared in Tables  4, 5, 6. In All the scenarios were run on the same device by addition to IOU, the recall rates are presented. In simple using the same datasets for training, validating and test- terms, recalls are used to measure the probability that a ing. The run times (seconds/epoch) for the single-input predicted class for a pixel is true. The equation is similar and fused-input scenarios are summarized in Table  3, to IOU, except that the FN term is removed, so not rec- using a Predator PH317 with an Nvidia GeForce RTX ognizing a class is not as strongly penalized. 2070 GPU. Fusing the thermal data to the RGB data Sample 1, shown in the first column of Fig.  7, repre- increases the run time by approximately 30 percent. sents a scene with a complex mix of cracks and joints, as well as some vegetation in a well-lit scene. The results 3.2 Q ualitative results comparison of this scenario indicate that the RGBT model is slightly A sample of the inputs, labels, and predictions is pro- better than the RGB model in identifying cracks. How- vided in Fig.  6. As shown, some features, such as joints ever, enhancements from fusing the thermal image are in the sidewalk, are challenging to be identified in the observed, as the RGBT model is much better at correctly RGB image, but are prominent in the thermal image. The differentiating between the classes. The RGB model misi - RGB-thermal pair performs well in predicting the joint dentified joints as spalling, resulting in a joint recall and locations and differentiating them from other classes, IOU score of 0. Vegetative growth was also misidentified such as spalling. The thermal-only model’s predictions as spalling.    The T model had a comparable recall rate are good for the class, but it lacks the sharpness in identi- to that of the RGBT and RGB models, but with a slight fying the boundaries. reduction in IOU, as the thermal image lacked crispness to correctly maintain the boundaries of the classes. Alexander et al. AI in Civil Engineering (2022) 1:3 Page 8 of 10 Fig. 7 Sample inputs and comparisons of output predictions A lexander et al. AI in Civil Engineering (2022) 1:3 Page 9 of 10 Table 4 Results of Sample 15 Conclusion The purpose of this study was to quantify the value of fus - Model RGBT RGB T ing RGB and thermal images to improve a deep learning Crack IOU 0.36 0.32 0.29 model for damage detection in large civil infrastructure. Crack recall 0.97 0.98 0.97 This is a novel approach for automated inspection of such Joint IOU 0.18 0 0.10 infrastructure by leveraging the strengths of each image Joint recall 0.94 0 0.82 type, especially, where the features from each image type are fused at each layer of the deep-learning network. The RTFNet framework developed for autonomous vehicles was used as the foundation of this study. Images were Table 5 Results of Sample 2 collected by using a relatively inexpensive combined Model RGBT RGB T thermal and RGB camera, which was connected to a mobile device. Thermal-RGB image pairs were properly Crack IOU 0.28 0.25 0.24 aligned, with annotations for semantic segmentation Crack recall 0.95 0.95 0.94 manually created for multiple classes, including cracks, Joint IOU 0.36 0 0.38 joints, spalling and vegetation. Four scenarios were eval- Joint recall 0.97 0 0.93 uated, including RGB-thermal fusion, RGB encoder only, RGB-fused with a blank image, and thermal image only. Each of the models was trained with over 6000 epochs, Table 6 Results of Sample 3 and using an 80/10/10 split for training, validating and testing. The results showed that the fusion of RGB and Model RGBT RGB T thermal spectrum images created a more robust model Crack IOU 0.25 0.22 0.20 for the sample dataset, increasing the IOU value boosted Crack recall 0.83 0.68 0.62 by approximately 14% over the RGB-only model for crack detection, while providing more reliable class identifica - tion. The models trained with the thermal images alone Sample 2, shown in the second column of Fig.  7, rep- delivered the lowest performance metrics. While the resents a much simpler scene with a crack and a joint. In thermal-only model was generally capable of predicting this sample, the material to the left of the joint is asphalt, the proper classes, the predictions lacked crispness and and the material to the right of the joint is concrete. The were often wider than the actual damage/joints. The pre - two material types  have different thermal properties. dictions on the RGB images alone were not capable of This difference can be seen in the thermal input image. consistently differentiating between the multiple class All models correctly identified the crack, but the RGB types, particularly in complex and low-light scenes. This model misidentified the joint as spalling, resulting in a study confirmed the hypothesis that fusion of RGB and joint recall and IOU score of 0, and overestimated the thermal images can outperform the RGB-only and T-only width. models. Therefore, it is demonstrated that the network is Sample 3, shown in the third column of Fig.  7, repre- able to leverage additional information provided by ther- sents a scene with only cracks, but in a low-light con- mal images to provide a more robust model for inspec- dition. The images were taken at night and included a tion tasks of in-service civil infrastructures inspection pavement stripe for more visual complexity. In this sce- tasks. nario, the cracks were correctly identified by the RGBT and T models. But a portion of the crack patterns was Author contributions misidentified as spalling by the RGB model. QA conceptualized the study, established the methodology, collected and In summary, these results show the significant poten - curated the data set, carried out the study, and drafted the manuscript; VH participated in the design of the study, coordinated the data labeling, and tials of the proposed RGBT approach in enhancing the helped review and edit the manuscript; YN participated in the design of efficiency and reliability of inspection of in-service civil the study, assisted in code modification, and helped review and edit the infrastructures. Such inspection is required to identify manuscript; AM established workflow for data curation and assisted in code development; BFS conceived of the study, participated in the design of the structural damage, while robustly differentiating damage study, provided resources, and helped review and edit the manuscript. All patterns from other similar patterns under various light- authors read and approved the final manuscript. ing conditions. Funding Partial financial support was received from the U.S. Army Corps of Engineers Engineer Research and Development Center. Alexander et al. AI in Civil Engineering (2022) 1:3 Page 10 of 10 concrete and asphalt civil infrastructure. Advanced Engineering Informat- Declarations ics, 29(2), 196–210. Liu, J., Zhang, S., Wang, S., & Metaxas, D. N. (2016). Multispectral deep neural Competing interests networks for pedestrian detection. https:// arxiv. org/ abs/ 1611. 02644. The authors have no relevant non-financial interests to disclose. Billie F. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for Spencer Jr is an editorial board member for AI in Civil Engineering and was semantic segmentation. 2015 IEEE Conference on Computer Vision and not involved in the editorial review, or the decision to publish, this article. All Pattern Recognition (CVPR), (pp. 3431–3440). Boston. authors declare that there are no competing interests. Lucas, H. C. (2012). The search for survival: Lessons from disruptive technologies. ABC-CLIO LLC. Author details Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). ENet: Deep neural net- US Army Engineer Research and Development Center, 3909 Halls Ferry Rd, work architecture for real-time semantic. arXiv preprint arXiv: 1606. 02147. Vicksburg, MS 39180, USA. Department of Civil and Environmental Engineer- Rao, Y., Prathapani, N., & Nagabhooshanam, E. (2014). Application of normal- ing, University of Houston, 4726 Calhoun Road, Houston, TX 77204, USA. Zhe- ized cross correlation to image registration. International Journal of jiang University/University of Illinois at Urbana-Champaign Institute, Room Research in Engineering and Technology, 3(5), 12–15. C323, Engineering Building, 718 East Haizhou Road, Haining 314400, Zhejiang, Yasrab, R., Gu, N., & Zhang, X. (2017). An encoder-decoder based convolu- China. Department of Civil and Environmental Engineering, University tion neural network (CNN) for future advanced driver assistance system of Illinois-Urbana Champaign, 205 N Mathews Ave, Urbana, IL 61801, USA. (ADAS). Applied Sciences, 7(4). Shivakumar, S. S., Rodrigues, N., & Zhou, A. (2019). PST900: RGB-thermal calibra- Received: 30 November 2021 Accepted: 18 March 2022 tion, dataset and segmentation network. Retrieved from http:// arxiv. org/ abs/ 1909. 10980 Spencer, B. F., Jr., Hoskere, V., & Narazaki, Y. (2019). Advances in computer vision- based civil infrastructure inspection and monitoring. Engineering, 5(2), 199–222. References Sun, Y., Zuo, W., & Liu, M. (2019). RTFNet: RGB-thermal fusion network for Alexander, Q. G., & Lunderman, C. V. (2021). Thermal camera reliability study: Flir semantic segmentation of urban scenes. IEEE Robotics and Automation One Pro. US Army Engineer Research and Development Center. Letters, 4(3), 2576–2583. Alexander, Q. G., Hoskere, V., Spencer Jr., B. F., & Smith, D. M. (2019). Towards the Washer, G., Fenwick, R., Nelson, S., & Rumbayan, R. (2013). Guidelines for application of image based monitoring of USACE Large Civil Infrastruc- thermographic inspection of concrete bridge components in shaded ture. International Workshop for Structural Health Monitoring. Palo Alto, conditions. Transportation Research Record: Journal of the Transportation CA. Research Board, 2360(1), 13–20. An, Y.-K., Jang, K.-Y., Kim, B., & Cho, S. (2018). Deep learning-based concrete Ye, X. W., Jin, R., & Yunc, C. B. (2019). A review on deep learning-based structural crack detection using hybrid images. Sensors and Smart Structures Tech- health monitoring of civil infrastructures. Smart Structures and Systems, nologies for Civil, Mechanical, and Aerospace Systems 2018. Denver. 24(5), 567–575. ASCE. (2020). Changing the infrastructure equation: Using aset management to optimize investments. Retrieved March 20, 2021, from http:// prepr od. asce. org/-/ media/ asce- images- and- files/ advoc acy/ docum ents/ chang Publisher’s Note ing- infra struc ture- equat ion- report. pdf Springer Nature remains neutral with regard to jurisdictional claims in pub- ASTM International. (2013). D4788-03(2013) standard test method for detect- lished maps and institutional affiliations. ing delaminations in bridge decks using infrared thermography. West Conshohocken: ASTM International. Avci, O., Abdeliaber, O., Kiranyaz, S., Hussein, M., Gabboui, M., & Inman, D. J. (2021). A review of vibration-based damage detection in civil structures: From traditional methods to machine learning and deep learning appli- cations. Mechanical Systems and Signal Processing, 147, 10707. Avdelidis, N. P., & Moropoulou, A. (2004). Applications of infrared thermogra- phy for the investigation of historic structures. Journal of Cultural Heritage, 5(1), 119–127. Bao, Y., & Li, H. (2020). Machine learning paradigm for structural health moni- toring. Structural Health Monitoring, 20, 1353–13723. Brynjolfsson, E., & McAfee, A. (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. W.W. Norton & Company. Commons, W. (2015). File:Hendys Law.jpg. Retrieved April 18, 2020, from https:// commo ns. wikim edia. org/ wiki/ File: Hendys_ Law. jpg Dong, C.-Z., & Catbas, F. (2020). A review of computer vision-based structural health monitoring at local and global levels. Structural Health Monitoring, 20(2), 692–743. Fluke. (2021). What does infrared mean? Retrieved October 31, 2021, from https:// www. fluke. com/ en- us/ learn/ blog/ therm al- imagi ng/ how- therm al- camer as- use- infra red- therm ograp hy Hess, M., Vanoni, D., Petrovic, V., & Kuester, F. (2015). High-resolution thermal imaging methodology for non-destructive evaluation of historic struc- tures. Infrared Physics and Technology, 73, 219–225. Hoskere, V., Fouad, A., Friedel, D., Yang, W., Tang, Y., Narazaki, Y., et al. (2021). InstaDam: Open-source platform for rapid semantic segmentation of structural damage. Applied Sciences, 11(2), 520. Hoskere, V., Narazaki, Y., Hoang, T. A., & Spencer, B. F., Jr. (2020). MaDnet: Multi task semantic segmentation of multiple types of structural materials and damage in images of civil infrastructure. Journal of Civil Structural Health Monitoring, 10, 757–773. Koch, C., Doycheva, K., Kasi, V., Akinci, B., & Fieguth, P. (2015). A review on computer vision based defect detection and condition assessment of http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png AI in Civil Engineering Springer Journals

Fusion of thermal and RGB images for automated deep learning based crack detection in civil infrastructure

Loading next page...
 
/lp/springer-journals/fusion-of-thermal-and-rgb-images-for-automated-deep-learning-based-b6oLMqbr5H

References (30)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
2730-5392
DOI
10.1007/s43503-022-00002-y
Publisher site
See Article on Publisher Site

Abstract

Research has been continually growing toward the development of image-based structural health monitoring tools that can leverage deep learning models to automate damage detection in civil infrastructure. However, these tools are typically based on RGB images, which work well under ideal lighting conditions, but often have degrading performance in poor and low-light scenes. On the other hand, thermal images, while lacking in crispness of details, do not show the same degradation of performance in changing lighting conditions. The potential to enhance automated damage detection by fusing RGB and thermal images together within a deep learning network has yet to be explored. In this paper, RGB and thermal images are fused in a ResNET-based semantic segmentation model for vision-based inspections. A convolutional neural network is then employed to automatically identify damage defects in concrete. The model uses a thermal and RGB encoder to combine the features detected from both spectrums to improve its performance of the model, and a single decoder to predict the classes. The results suggest that this RGB-thermal fusion network outperforms the RGB-only network in the detection of cracks using the Intersection Over Union (IOU) performance metric. The RGB-thermal fusion model not only detected damage at a higher performance rate, but it also performed much better in differentiating the types of damage. Keywords: Infrared thermography, Structural health monitoring, Image fusion, Automated crack detection 1 Introduction asset management as an effective tool for managing Infrastructure in the United States is in a growing state capital assets across various types of infrastructure to of disrepair, as the annual needs for repair/replacement minimize the total cost of maintenance and operation funding continue to outpace the spending. The Ameri - in a dynamic and data-rich environment. Furthermore, can Society of Civil Engineers (ASCE) estimated that it points out that the key to success of this proactive the gap in infrastructure funding reached $2 trillion dol- approach lies in the significant advances in the monitor - lars between 2016 and 2025 (ASCE, 2020). To combat ing technologies, to prioritize needs and plan long-term this funding gap, infrastructure managers are moving strategies. As automated structural health monitoring towards an asset management-based model to maximize (SHM) tools progress, their potential impact on asset the impact of the limited funding. ASCE (2020) describes management practices will continue to grow. SHM tools employing deep learning have shown their efficacy in providing powerful data processing Vedhus Hoskere, Yasutaka Narazaki and Andrew Maxwell contributed equally approaches for damage detection and structural condi- to this work. tion assessments for a variety of infrastructure types (Bao *Correspondence: quincy.g.alexander@erdc.dren.mil & Li, 2020; Hess et  al., 2015; Ye et  al., 2019). Computer US Army Engineer Research and Development Center, 3909 Halls Ferry Rd, vision-based deep learning techniques, in particular, Vicksburg, MS 39180, USA have seen a tremendous growth as computational power Full list of author information is available at the end of the article © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Alexander et al. AI in Civil Engineering (2022) 1:3 Page 2 of 10 ASTM International, 2013). Passive IRT, specifically, is commonly used in the NDE of large civil infrastructure, where artificially heating the specimen is not always fea - sible (Avdelidis & Moropoulou, 2004). For passive IRT, solar radiation serves as the heat source, and studies have been performed to understand the optimum times to capture images with the highest thermal contrast and that diurnal temperature variations can be adequate to support the use of IRT to detect defects in fully shaded regions (Alexander et al., 2019; Washer et al., 2013). Visible (RGB) and thermal images each have their own Fig. 1 Relative cost per pixel, through time strengths and weaknesses. For example, visible images can provide a clear representation of the scene at high resolutions and low cost per pixel, but light sources can affect the image quality. In contrast, thermal images are continues to increase and the cost per pixel of data con- much more robust to variations in lighting conditions tinues to decrease (Koch et  al., 2015; Long et  al., 2015). and can provide sub-surface information, but generally As noted by Barry Hendy of Kodak Digital Camera Tech- suffer from comparatively low resolution and weak crisp - nology has developed at an exponential improvement ness. Several manufactures have produced cameras that rate similar to Moore’s law for microprocessors, with can capture both visible and thermal images simultane- the pixels per dollar doubling annually when comparing ously, thus providing more contextual information for the cameras of similar feature level (Brynjolfsson & McAfee, thermal images. 2014; Commons, 2015; Lucas, 2012). This trend across a Research has been performed in various fields to deter - range of camera types is illustrated in Fig. 1. mine how the two image types (RGB and thermal) can Leveraging the reduced technology cost and the be combined to enhance their respective advantages, increased computational power, there has been an while neutralizing their disadvantages. For example, exponential increase in the creation and deployment of the automotive industry is particularly interested in the computer vision-based inspection and infrastructure fusion of RGB and thermal images for object detection monitoring methods. For example, Hoskere et  al. (2020) in autonomous vehicles, for which low light conditions proposed the use of semantic segmentation for vision- and strong glares are common challenges (Sun et  al., based structural inspection, where a fully convolutional 2019). Liu et al. (2016) showed an 11% improvement over neural network was used to automatically identify multi- the baseline in pedestrian detection by using their pro- ple damage defects, like cracks, corrosion, spalling, etc., posed fusion model. Shivakumar et  al. (2019) proposed on certain structures of interest to the U.S. Army Corps an RGB-thermal fusion technique in the DARPA Subter- of Engineers, such as lock gates and bridges. A detailed ranean Challenge to identify four different object classes review of the current research, key challenges, and ongo- (person, drill, backpack, and fire-extinguisher). An et  al. ing work related to computer vision-based tools for the (2018) used an image matching technique for crack inspection of infrastructure can be found by Spencer detection, which can compare areas on thermal and vis- et  al. (2019). Dong and Catbas (2020), and Avci et  al. ible spectrum images to determine to identify matching (2021) also provided a comprehensive summary of com- cracks, thus reducing the false positives. For each of the puter vision-based SHM tools and techniques at both referenced applications, the fusion of thermal and RGB local and global levels, including the factors that can images performed better than adoption of only RGB affect accuracies. images. Infrared thermography (IRT) is the science of detect- While the existing researchers have previously inves- ing infrared energy emitted by an object, converting it tigated the use of both visible and thermal images for into an apparent temperature, and displaying the result damage detection in civil infrastructure, those studies are as an image (Fluke, 2021). Infrared (IR) wavelengths typically performed on simple laboratory test specimens are longer than those found in the visible spectrum and that lacked the type of visual complexity that is present in are not visible with the human eye. IRT can often pro- the field. In addition, these studies have leveraged active vide structural anomaly information that is not identifi - IRT, which requires an artificial heating element to high - able in the visible spectrum; indeed, the power of IRT for light the flaws for detection. Moreover, they are based on non-destructive evaluation (NDE) has been well docu- object detection, simply drawing a box around the identi- mented (Avdelidis & Moropoulou, 2004; Hess et al., 2015; fied cracks without considering the pixel-level accuracy A lexander et al. AI in Civil Engineering (2022) 1:3 Page 3 of 10 of the prediction, which is nevertheless crucial for dam- study about the reliability of the unit revealed that the age severity assessment. camera’s performance was found to be within the speci- This paper proposes to fuse features from visible and fication for temperatures ranging from 0 °F to 120 °F thermal spectrum images to develop a robust automated (Alexander & Lunderman, 2021). damage detection strategy for in-service civil infrastruc- The RGB-thermal pairs of images are annotated for ture. The novelty of the research effort is the quantifica - semantic segmentation as part of this research effort. tion of the benefits of fusing thermal features within the Annotating images for semantic segmentation is an ardu- neural network for a semantic segmentation model, with ous task, as each pixel in the image must be assigned a class predictions determined per pixel. To achieve the label. While the annotation of regular shapes, like poly- goal, a curated dataset of both damaged and undamaged gons, can be completed with a few clicks, such amor- in-service infrastructure is developed, with each visible- phous shapes as cracks require meticulous attention to spectrum image having a corresponding thermal image details. To develop a high-quality dataset for this study, as well as hand-labelled ground truth images for seman- annotations are conducted using InstaDam (Hoskere tic segmentation. The thermal images are collected in et  al., 2021), an open-source software for damage anno- the field using passive IRT technology, with a low-cost tation developed by the University of Illinois at Urbana- thermal imager that connects to a mobile device. This set Champaign. The dataset collected and used for training, of hardware better aligns with the tools that would com- validating and testing for this effort contains images of monly be utilized by inspectors in the field. Additionally, in-service infrastructure taken in the field. images of concrete joints are included in the dataset to represent crack-like visual patterns that are actually not 2.2 Image alignment associated with any damage. The model, therefore, not Fusion of images requires that the images are properly only has to detect the cracks, but also is able to differen - aligned. To match two images, Rao et al. (2014) outlined tiate between visually similar classes. The performance a normalized cross-correlation (NCC) approach, which is then measured by the predictions of each pixel. This works well when there is good structure within the two research demonstrates that the fusion of RGB and ther- images. In this approach, one image is held fixed, while mal images improves the performance of the model over the position of the second image is moved pixel-by-pixel, the RGB only model in properly predicting cracks, as with the quality of the match calculated at each position. well as in differentiating between cracks and comparable The quality of the match is quantified through a coeffi - features. cient of correlation, a value ranging from 0 to 1, where 1 indicates a perfect match. The position with the highest correlation coefficient should provide the best alignment 2 Methodologies between the two images. The dataset collected and used for training, validat - As shown in Fig. 2, the native resolution of the thermal ing and testing for this effort contains images of certain images is lower than that of the visible image; therefore, in-service infrastructure taken in the field. This section the thermal image should be scaled up to align the ther- describes the curation of the dataset, with aligned RGB mal scenes with the corresponding visible image. The and thermal images of the damaged infrastructure and thermal image is appropriately scaled, and then the NCC corresponding labelled ground truths for use in a seman- method is applied to the scaled thermal image to locate tic segmentation algorithm. the position of the maximum correlation coefficient. The image size is made consistent with the RGB image by 2.1 Data collection zero-padding. To qualitatively validate the accuracy of The FLIR One Pro Gen 3 (Flir One) thermal camera was this approach, the two images are blended. Fig.  3 shows: selected to collect data, due to its balance between a low (a) the RGB image, (b) the padded thermal image, and (c) price point and relatively high thermal resolution (Alex- the RGB-padded thermal blended images for one exam- ander & Lunderman, 2021). The FLIR One unit can cap - ple scene. Finally, this approach is applied to all the image ture thermal and RGB images simultaneously. When this pairs in the dataset, using the iron palette (e.g., Fig.  3c) camera is connected to a mobile device, the FLIR One and greyscale palette used for the thermal images. How- mobile app is used as a viewfinder and to control the ever, the NCC approach is not effective for all image pairs operations of the camera. Thermal images are captured and specifically works poorly for the images with low at a resolution of 160 × 120 pixels for storage on the thermal definition. Therefore, some images have to be device and then can be decompressed to a resolution of manually realigned. 640 × 480 pixels when uploaded to a personal computer by using the FLIR software. The visible spectrum images have a resolution of 1440 × 1080 pixels. A preliminary Alexander et al. AI in Civil Engineering (2022) 1:3 Page 4 of 10 Table 1 Class label overview and weight Class Description Pixels Class probability Weight 0 No label 242,470,928 0.9868 1.4357 1 Crack 2,252,136 0.0092 34.7848 2 Joint 998,536 0.0041 42.0544 3 Spalling 1,360,938 0.0055 39.654 4 Vegetation 2,678,104 0.0055 39.6544 2.4 Network architecture The RTFNet network proposed by Sun et  al. (2019) is used as the foundation for analysis in this study. The RTFNet architecture consists of an RGB encoder, a par- allel thermal encoder, and a single decoder followed by the pixel-wise classification prediction. The encoder pro - duces low-resolution feature maps for the RGB image and the thermal image, and the decoder up-samples the features to develop dense feature maps (Yasrab et  al., 2017). The features acquired from each layer within the Fig. 2 Illustration of visible and thermal image pair alignment thermal decoder are mapped to the corresponding layer within the RGB encoder, as part of the fusion process. This network is illustrated in Fig.  4. The encoder is based 2.3 Da ta preparation and annotation on the Residual Network (ResNet) architecture, which Each image in the dataset has its corresponding pixel- has certain variants based on the number of layers. The wise annotations. Seventy-five of the images are selected ResNet-18 model with 18 neural network layers is used with at least one crack and well-defined thermal contrast. in this study. The frequency of occurrences of different labelled classes Within the network, the classes are weighted based on is provided in Table 1. While the focus of this study is on the pixel distribution, according to the class weighting crack detection, additional labels are included for spalling methodology outlined by Paszke et  al. (2016). The class and vegetation growth. All the images are cropped to the weighting formula is provided in Eq.  1. And the results size and position corresponding to the thermal images are shown in Table 1. by removing the padded borders. The datasets generated during the current study are available from the corre- sponding authors on reasonable requests. Fig. 3 a Visible image, b padded thermal image, and c blended image A lexander et al. AI in Civil Engineering (2022) 1:3 Page 5 of 10 Fig. 4 RGB-thermal fusion network architecture [Sun et al., 2019] (1) Fusing the RGB and thermal images (RGBT). The Weight = , (1) greyscale version of the thermal images is used in ln c + class_probability the analysis. (2) Fusing the RGB images with a blank image (RGBB). where c is the  Paszke Method Coefficient (1.02),  and This scenario represents the condition where andclass_probability indicates the Ratio of the pixels of only RGB data is available in an architecture that an individual class to the total number of pixels in the includes an empty (white) thermal input in the dataset. encoder. Image augmentation schemes are applied to improve (3) Removing the thermal encoder from the architec- the training results. First, RGB images are duplicated, ture and analysing the RGB images only (RGB). and the brightness of the matching images is reduced (4) Removing the RGB encoder from the architecture uniformly to simulate a low-light environment. The cor - and analysing the thermal images only (T). responding label images and thermal images are not modified. This augmentation method would double the size of the dataset, which is then randomly split by 80/10/10 for training/validating/testing, respectively. 3.1 Model performance evaluation When the model is run for training, further data aug- Both the RGBB and RGB models are included in the mentations is applied: random flip, random noise, ran - analysis to validate the process, as the RGB-blank dom brightness change, and random cropping. pair should perform similarly to the RGB-only model. The performance of these four scenarios is measured 3 Results in terms of Intersection over Union (IOU), which is The following four scenarios are studied as part of the one of the most common performance metrics used effort to quantify the value, including the thermal data: for semantic segmentation. At the pixel level, IOU Alexander et al. AI in Civil Engineering (2022) 1:3 Page 6 of 10 Fig. 5 Smoothed crack detection IOU rate for RGBT, RGBB, RGB and T datasets Table 2 IOU performance summary at 6000 epochs Table 3 Runtime comparison Scenario IOU crack Model Avg. detection seconds/ performance epoch RGBT 0.31 Dual encoder 23.5 RGB/RGBB 0.27 Single encoder 18.1 T 0.20 The results are shown in Fig.  5, and a summary of the indicates the ratio of the correct class predictions to the performance at 6000 epochs is provided in Table 2. The sum of the correct and incorrect class predictions, as results show that the fusion of RGB and thermal images shown in Eq.  2. The performance for each scenario is outperforms RGB-only and T-only models, indicat- shown in Fig. 5, after applying a Locally Weighted Scat- ing that the network is able to leverage the additional terplot Smoothing (LOWESS) regression technique information provided by the thermal images. By 6000 applied. epochs, the performance of each scenario becomes stabilized relative to each other. The RGBT model Area of overlap TP outperforms the RGB-only model by approximately IOU = = , (2) Area of union TP + FP + FN 15%. The RGBB network was trained to ensure that any performance improvement of the RGBT network over the RGB network was due to the additional infor- where TP is the true positive (pixels), FP the false posi- mation from the thermal images and not due to addi- tive (pixels), and FN the false negative (pixels). tional parameters in the network. The RGBB and RGB A lexander et al. AI in Civil Engineering (2022) 1:3 Page 7 of 10 Fig. 6 Sample Inputs and comparison between RGB, RGBT and T predictions accuracies align well with each other, signifying that the 4 Further discussion dual encoder systems with the second blank image per- To illustrate the overall performance of the model, three form similarly to an RGB-only image as expected. The specific conditions were evaluated. The first condition results also show the ability of thermal images alone to (Sample 1) displays a complex mix of classes; the second provide an indication of cracks at approximately 74% of condition (Sample 2) represents certain visually similar the rate of the RGB-only model, and 66% of the rate of materials with different thermal characteristics; and the the RGBT model. Thus, damage can be indicated in a third condition (Sample 3) represents low light condi- scene with reduced impact of the lighting conditions, tions. These three conditions are highlighted in Fig.  7, which supports the original hypothesis. with their performance compared in Tables  4, 5, 6. In All the scenarios were run on the same device by addition to IOU, the recall rates are presented. In simple using the same datasets for training, validating and test- terms, recalls are used to measure the probability that a ing. The run times (seconds/epoch) for the single-input predicted class for a pixel is true. The equation is similar and fused-input scenarios are summarized in Table  3, to IOU, except that the FN term is removed, so not rec- using a Predator PH317 with an Nvidia GeForce RTX ognizing a class is not as strongly penalized. 2070 GPU. Fusing the thermal data to the RGB data Sample 1, shown in the first column of Fig.  7, repre- increases the run time by approximately 30 percent. sents a scene with a complex mix of cracks and joints, as well as some vegetation in a well-lit scene. The results 3.2 Q ualitative results comparison of this scenario indicate that the RGBT model is slightly A sample of the inputs, labels, and predictions is pro- better than the RGB model in identifying cracks. How- vided in Fig.  6. As shown, some features, such as joints ever, enhancements from fusing the thermal image are in the sidewalk, are challenging to be identified in the observed, as the RGBT model is much better at correctly RGB image, but are prominent in the thermal image. The differentiating between the classes. The RGB model misi - RGB-thermal pair performs well in predicting the joint dentified joints as spalling, resulting in a joint recall and locations and differentiating them from other classes, IOU score of 0. Vegetative growth was also misidentified such as spalling. The thermal-only model’s predictions as spalling.    The T model had a comparable recall rate are good for the class, but it lacks the sharpness in identi- to that of the RGBT and RGB models, but with a slight fying the boundaries. reduction in IOU, as the thermal image lacked crispness to correctly maintain the boundaries of the classes. Alexander et al. AI in Civil Engineering (2022) 1:3 Page 8 of 10 Fig. 7 Sample inputs and comparisons of output predictions A lexander et al. AI in Civil Engineering (2022) 1:3 Page 9 of 10 Table 4 Results of Sample 15 Conclusion The purpose of this study was to quantify the value of fus - Model RGBT RGB T ing RGB and thermal images to improve a deep learning Crack IOU 0.36 0.32 0.29 model for damage detection in large civil infrastructure. Crack recall 0.97 0.98 0.97 This is a novel approach for automated inspection of such Joint IOU 0.18 0 0.10 infrastructure by leveraging the strengths of each image Joint recall 0.94 0 0.82 type, especially, where the features from each image type are fused at each layer of the deep-learning network. The RTFNet framework developed for autonomous vehicles was used as the foundation of this study. Images were Table 5 Results of Sample 2 collected by using a relatively inexpensive combined Model RGBT RGB T thermal and RGB camera, which was connected to a mobile device. Thermal-RGB image pairs were properly Crack IOU 0.28 0.25 0.24 aligned, with annotations for semantic segmentation Crack recall 0.95 0.95 0.94 manually created for multiple classes, including cracks, Joint IOU 0.36 0 0.38 joints, spalling and vegetation. Four scenarios were eval- Joint recall 0.97 0 0.93 uated, including RGB-thermal fusion, RGB encoder only, RGB-fused with a blank image, and thermal image only. Each of the models was trained with over 6000 epochs, Table 6 Results of Sample 3 and using an 80/10/10 split for training, validating and testing. The results showed that the fusion of RGB and Model RGBT RGB T thermal spectrum images created a more robust model Crack IOU 0.25 0.22 0.20 for the sample dataset, increasing the IOU value boosted Crack recall 0.83 0.68 0.62 by approximately 14% over the RGB-only model for crack detection, while providing more reliable class identifica - tion. The models trained with the thermal images alone Sample 2, shown in the second column of Fig.  7, rep- delivered the lowest performance metrics. While the resents a much simpler scene with a crack and a joint. In thermal-only model was generally capable of predicting this sample, the material to the left of the joint is asphalt, the proper classes, the predictions lacked crispness and and the material to the right of the joint is concrete. The were often wider than the actual damage/joints. The pre - two material types  have different thermal properties. dictions on the RGB images alone were not capable of This difference can be seen in the thermal input image. consistently differentiating between the multiple class All models correctly identified the crack, but the RGB types, particularly in complex and low-light scenes. This model misidentified the joint as spalling, resulting in a study confirmed the hypothesis that fusion of RGB and joint recall and IOU score of 0, and overestimated the thermal images can outperform the RGB-only and T-only width. models. Therefore, it is demonstrated that the network is Sample 3, shown in the third column of Fig.  7, repre- able to leverage additional information provided by ther- sents a scene with only cracks, but in a low-light con- mal images to provide a more robust model for inspec- dition. The images were taken at night and included a tion tasks of in-service civil infrastructures inspection pavement stripe for more visual complexity. In this sce- tasks. nario, the cracks were correctly identified by the RGBT and T models. But a portion of the crack patterns was Author contributions misidentified as spalling by the RGB model. QA conceptualized the study, established the methodology, collected and In summary, these results show the significant poten - curated the data set, carried out the study, and drafted the manuscript; VH participated in the design of the study, coordinated the data labeling, and tials of the proposed RGBT approach in enhancing the helped review and edit the manuscript; YN participated in the design of efficiency and reliability of inspection of in-service civil the study, assisted in code modification, and helped review and edit the infrastructures. Such inspection is required to identify manuscript; AM established workflow for data curation and assisted in code development; BFS conceived of the study, participated in the design of the structural damage, while robustly differentiating damage study, provided resources, and helped review and edit the manuscript. All patterns from other similar patterns under various light- authors read and approved the final manuscript. ing conditions. Funding Partial financial support was received from the U.S. Army Corps of Engineers Engineer Research and Development Center. Alexander et al. AI in Civil Engineering (2022) 1:3 Page 10 of 10 concrete and asphalt civil infrastructure. Advanced Engineering Informat- Declarations ics, 29(2), 196–210. Liu, J., Zhang, S., Wang, S., & Metaxas, D. N. (2016). Multispectral deep neural Competing interests networks for pedestrian detection. https:// arxiv. org/ abs/ 1611. 02644. The authors have no relevant non-financial interests to disclose. Billie F. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for Spencer Jr is an editorial board member for AI in Civil Engineering and was semantic segmentation. 2015 IEEE Conference on Computer Vision and not involved in the editorial review, or the decision to publish, this article. All Pattern Recognition (CVPR), (pp. 3431–3440). Boston. authors declare that there are no competing interests. Lucas, H. C. (2012). The search for survival: Lessons from disruptive technologies. ABC-CLIO LLC. Author details Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). ENet: Deep neural net- US Army Engineer Research and Development Center, 3909 Halls Ferry Rd, work architecture for real-time semantic. arXiv preprint arXiv: 1606. 02147. Vicksburg, MS 39180, USA. Department of Civil and Environmental Engineer- Rao, Y., Prathapani, N., & Nagabhooshanam, E. (2014). Application of normal- ing, University of Houston, 4726 Calhoun Road, Houston, TX 77204, USA. Zhe- ized cross correlation to image registration. International Journal of jiang University/University of Illinois at Urbana-Champaign Institute, Room Research in Engineering and Technology, 3(5), 12–15. C323, Engineering Building, 718 East Haizhou Road, Haining 314400, Zhejiang, Yasrab, R., Gu, N., & Zhang, X. (2017). An encoder-decoder based convolu- China. Department of Civil and Environmental Engineering, University tion neural network (CNN) for future advanced driver assistance system of Illinois-Urbana Champaign, 205 N Mathews Ave, Urbana, IL 61801, USA. (ADAS). Applied Sciences, 7(4). Shivakumar, S. S., Rodrigues, N., & Zhou, A. (2019). PST900: RGB-thermal calibra- Received: 30 November 2021 Accepted: 18 March 2022 tion, dataset and segmentation network. Retrieved from http:// arxiv. org/ abs/ 1909. 10980 Spencer, B. F., Jr., Hoskere, V., & Narazaki, Y. (2019). Advances in computer vision- based civil infrastructure inspection and monitoring. Engineering, 5(2), 199–222. References Sun, Y., Zuo, W., & Liu, M. (2019). RTFNet: RGB-thermal fusion network for Alexander, Q. G., & Lunderman, C. V. (2021). Thermal camera reliability study: Flir semantic segmentation of urban scenes. IEEE Robotics and Automation One Pro. US Army Engineer Research and Development Center. Letters, 4(3), 2576–2583. Alexander, Q. G., Hoskere, V., Spencer Jr., B. F., & Smith, D. M. (2019). Towards the Washer, G., Fenwick, R., Nelson, S., & Rumbayan, R. (2013). Guidelines for application of image based monitoring of USACE Large Civil Infrastruc- thermographic inspection of concrete bridge components in shaded ture. International Workshop for Structural Health Monitoring. Palo Alto, conditions. Transportation Research Record: Journal of the Transportation CA. Research Board, 2360(1), 13–20. An, Y.-K., Jang, K.-Y., Kim, B., & Cho, S. (2018). Deep learning-based concrete Ye, X. W., Jin, R., & Yunc, C. B. (2019). A review on deep learning-based structural crack detection using hybrid images. Sensors and Smart Structures Tech- health monitoring of civil infrastructures. Smart Structures and Systems, nologies for Civil, Mechanical, and Aerospace Systems 2018. Denver. 24(5), 567–575. ASCE. (2020). Changing the infrastructure equation: Using aset management to optimize investments. Retrieved March 20, 2021, from http:// prepr od. asce. org/-/ media/ asce- images- and- files/ advoc acy/ docum ents/ chang Publisher’s Note ing- infra struc ture- equat ion- report. pdf Springer Nature remains neutral with regard to jurisdictional claims in pub- ASTM International. (2013). D4788-03(2013) standard test method for detect- lished maps and institutional affiliations. ing delaminations in bridge decks using infrared thermography. West Conshohocken: ASTM International. Avci, O., Abdeliaber, O., Kiranyaz, S., Hussein, M., Gabboui, M., & Inman, D. J. (2021). A review of vibration-based damage detection in civil structures: From traditional methods to machine learning and deep learning appli- cations. Mechanical Systems and Signal Processing, 147, 10707. Avdelidis, N. P., & Moropoulou, A. (2004). Applications of infrared thermogra- phy for the investigation of historic structures. Journal of Cultural Heritage, 5(1), 119–127. Bao, Y., & Li, H. (2020). Machine learning paradigm for structural health moni- toring. Structural Health Monitoring, 20, 1353–13723. Brynjolfsson, E., & McAfee, A. (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. W.W. Norton & Company. Commons, W. (2015). File:Hendys Law.jpg. Retrieved April 18, 2020, from https:// commo ns. wikim edia. org/ wiki/ File: Hendys_ Law. jpg Dong, C.-Z., & Catbas, F. (2020). A review of computer vision-based structural health monitoring at local and global levels. Structural Health Monitoring, 20(2), 692–743. Fluke. (2021). What does infrared mean? Retrieved October 31, 2021, from https:// www. fluke. com/ en- us/ learn/ blog/ therm al- imagi ng/ how- therm al- camer as- use- infra red- therm ograp hy Hess, M., Vanoni, D., Petrovic, V., & Kuester, F. (2015). High-resolution thermal imaging methodology for non-destructive evaluation of historic struc- tures. Infrared Physics and Technology, 73, 219–225. Hoskere, V., Fouad, A., Friedel, D., Yang, W., Tang, Y., Narazaki, Y., et al. (2021). InstaDam: Open-source platform for rapid semantic segmentation of structural damage. Applied Sciences, 11(2), 520. Hoskere, V., Narazaki, Y., Hoang, T. A., & Spencer, B. F., Jr. (2020). MaDnet: Multi task semantic segmentation of multiple types of structural materials and damage in images of civil infrastructure. Journal of Civil Structural Health Monitoring, 10, 757–773. Koch, C., Doycheva, K., Kasi, V., Akinci, B., & Fieguth, P. (2015). A review on computer vision based defect detection and condition assessment of

Journal

AI in Civil EngineeringSpringer Journals

Published: Aug 18, 2022

Keywords: Infrared thermography; Structural health monitoring; Image fusion; Automated crack detection

There are no references for this article.