Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization

Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization In many medical image analysis applications, often only a limited amount of training data is available, which makes training of convolutional neural networks (CNNs) challenging. In this work on anatomical landmark localization, we propose a CNN architecture that learns to split the localization task into two simpler sub-problems, reducing the need for large training datasets. Our fully convolutional SpatialCon guration-Net (SCN) dedicates one component to locally accurate but ambiguous candidate predictions, while the other component improves robustness to ambiguities by incorporating the spatial con guration of landmarks. In our experimental evaluation, we show that the proposed SCN outperforms related methods in terms of landmark localization error on size-limited datasets. Keywords: anatomical landmarks, localization, heatmap regression, spatial con guration 1. Introduction Localization of anatomical landmarks is an important step in medical image analysis, e.g., in segmentation (Beichel et al., 2005), or registration (Johnson and Christensen, 2002). Unfortunately, locally similar structures often introduce diculties due to ambiguity into landmark localization. To deal with these diculties, machine learning based approaches often combine local landmark predictions with explicit handcrafted graphical models, aiming to restrict predictions to feasible spatial con gurations. Thus, the landmark localization problem is simpli ed by separating the task into two successive steps. The rst step is dedicated to locally accurate but potentially ambiguous predictions, while in the second step graphical models (Cootes et al., 1995; Felzenszwalb and Huttenlocher, 2005) eliminate ambiguities. Recent advances in computer vision and medical imaging have mainly been driven by convolutional neural networks (CNNs) due to their superior capabilities to automatically learn important image features (LeCun et al., 2015). Unfortunately, CNNs typically need large amounts of training data. Especially in medical imaging, this requirement is hard to ful ll, due to ethical and nancial concerns as well as time consuming expert annotations. In this work, we show that the amount of required training data can be reduced with our proposed two-component SpatialCon guration-Net (SCN), which follows the idea of handcrafted graphical models to split landmark localization into two successive steps. This c 2019 C. Payer, D. Stern, H. Bischof & M. Urschler. arXiv:1908.00748v1 [eess.IV] 2 Aug 2019 Payer Stern Bischof Urschler SCN REGRESSION OBJECTIVE Local Spatial Input Prediction Target Landmarks Appearance Configuration Figure 1: Landmark localization by regressing heatmaps for each landmark in our end-to- end trained fully convolutional SpatialCon guration-Net (SCN). extended abstract gives a short overview of the key concepts of our journal paper pub- lished in (Payer et al., 2019), while we refer the reader to the full paper for more detailed descriptions and more extensive evaluations on a variety of datasets. 2. Method Our method for landmark localization is based on regressing heatmap images (Tompson et al., 2014), which encode the pseudo-probability of a landmark being located at a cer- tain pixel position. With N being the total number of landmarks, we de ne the target heatmap image of a landmark L , i = f1; :::; Ng as the d-dimensional Gaussian function d d g (x) : R ! R centered at the target landmark's groundtruth coordinate x 2 R . i i The network is set up to regress N heatmaps simultaneously by minimizing the di er- ences between predicted heatmaps h (x) and the corresponding target heatmaps g (x) in an i i end-to-end manner (Ronneberger et al., 2015; Shelhamer et al., 2017). In network inference, we obtain the predicted coordinate x ^ 2 R of each landmark L by taking the coordinate, i i where the heatmap has its highest value. 2.1. SpatialCon guration-Net The fundamental concept of the SpatialCon guration-Net (SCN) is the interaction between its two components (see Fig. 1). The rst component takes the image as input to generate LA locally accurate but potentially ambiguous local appearance heatmaps h (x). Motivated by handcrafted graphical models for eliminating these potential ambiguities, the second LA component takes the predicted candidate heatmaps h (x) as input to generate inaccurate SC but unambiguous spatial con guration heatmaps h (x). For N landmarks, the set of predicted heatmaps H = fh (x) j i = 1 : : : Ng is obtained LA SC by element-wise multiplication of the corresponding heatmap outputs h (x) and h (x) i i of the two components: LA SC h (x) = h (x) h (x): (1) i i This multiplication is crucial for the SCN, as it forces both of its components to generate LA SC a response on the location of the target landmark x , i.e., both h (x) and h (x) deliver i i 2 SpatialConfiguration-Net for Landmark Localization 100 100 80 80 60 60 SCN SCN Localization U-Net SCN-100 Payer et al. (2016) 40 40 SCN-50 Urschler et al. (2018) SCN-10 Stern et al. (2016a) U-Net-100 20 20 Ebner et al. (2014) U-Net-50 Lindner et al. (2015) U-Net-10 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 IPE in mm IPE in mm (a ) all training images (b) 100/50/10 training images Figure 2: Cumulative distributions of the point-to-point error for 895 radiographs. (a) shows results compared with other state-of-the-art methods. (b) shows results of SCN and localization U-Net for reduced numbers of training images. responses for x close to x , while on all other locations one component may have a response as long as the other one does not have one. 3. Experiments and Results We evaluate our proposed SCN on a dataset of 895 radiographs of left hands with 37 annotated characteristic landmarks on nger tips and bone joints. We compare our SCN to state-of-the-art random regression forests (Ebner et al., 2014; Lindner et al., 2015; Stern et al., 2016; Urschler et al., 2018), our previous CNN-based method of (Payer et al., 2016), and our implementation of a localization U-Net for heatmap regression. Results of the image-speci c point-to-point errors for three-fold cross validation of the 895 radiographs are shown in Fig. 2. When using all training images, our SCN outperforms all other compared methods. Additionally, when drastically reducing the number of training images to 100, 50, and 10, respectively, our SCN greatly outperforms the localization U-Net. This con rms that splitting the localization task into predicting accurate but potentially ambiguous local appearance heatmaps and inaccurate but unambiguous spatial con guration heatmaps is especially useful when dealing with only limited amounts of training data. 4. Conclusion In conclusion, we have shown how to combine information of local appearance and spa- tial con guration into a single end-to-end trained network for landmark localization. Our generic architecture achieves state-of-the-art results in terms of localization error, even when only limited amounts of training images are available. We are currently looking into extend- ing our SCN regarding occluded structures and multi-object localization, and into adapting our SCN for semantic segmentation problems (see (Payer et al., 2018) for preliminary re- sults), where structural constraints may be used in a similar manner. number of images in % number of images in % Payer Stern Bischof Urschler References Reinhard Beichel, Horst Bischof, Franz Leberl, and Milan Sonka. Robust Active Appearance Models and Their Application to Medical Image Analysis. IEEE Trans. Med. Imaging, 24(9):1151{1169, sep 2005. doi: 10.1109/TMI.2005.853237. Tim F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. Active Shape Models-Their Training and Application. Comput. Vis. Image Underst., 61(1):38{59, jan 1995. doi: 10.1006/cviu.1995.1004. Thomas Ebner, Darko Stern, Ren e Donner, Horst Bischof, and Martin Urschler. Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks. In Proc. Med. Image Comput. Comput. Interv., pages 421{428. Springer, 2014. doi: 10.1007/978-3-319-10470-6 53. Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Pictorial Structures for Object Recogni- tion. Int. J. Comput. Vis., 61(1):55{79, 2005. doi: 10.1023/B:VISI.0000042934.15159.49. Hans J. Johnson and Gary E. Christensen. Consistent Landmark and Intensity-Based Image Registration. IEEE Trans. Med. Imaging, 21(5):450{461, 2002. doi: 10.1109/TMI.2002. Yann LeCun, Yoshua Bengio, and Geo rey Hinton. Deep Learning. Nature, 521(7553): 436{444, 2015. doi: 10.1038/nature14539. Claudia Lindner, Paul A. Bromiley, Mircea C. Ionita, and Tim F. Cootes. Robust and Accu- rate Shape Model Matching Using Random Forest Regression-Voting. IEEE Trans. Pat- tern Anal. Mach. Intell., 37(9):1862{1874, sep 2015. doi: 10.1109/TPAMI.2014.2382106. Christian Payer, Darko Stern, Horst Bischof, and Martin Urschler. Regressing Heatmaps for Multiple Landmark Localization Using CNNs. In Proc. Med. Image Comput. Comput. Interv., pages 230{238. Springer, 2016. doi: 10.1007/978-3-319-46723-8 27. Christian Payer, Darko Stern, Horst Bischof, and Martin Urschler. Multi-label Whole Heart Segmentation Using CNNs and Anatomical Label Con gurations. In Stat. Atlases Comput. Model. Hear. ACDC MMWHS Challenges. STACOM 2017., pages 190{198. Springer, 2018. doi: 10.1007/978-3-319-75541-0 20. Christian Payer, Darko Stern, Horst Bischof, and Martin Urschler. Integrating Spatial Con guration into Heatmap Regression Based CNNs for Landmark Localization. Med. Image Anal., 54:207{219, may 2019. doi: 10.1016/j.media.2019.03.007. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proc. Med. Image Comput. Comput. Interv., pages 234{241. Springer, 2015. doi: 10.1007/978-3-319-24574-4 28. Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(4):640{651, apr 2017. doi: 10.1109/TPAMI.2016.2572683. 4 SpatialConfiguration-Net for Landmark Localization Darko Stern, Thomas Ebner, and Martin Urschler. From Local to Global Random Regres- sion Forests: Exploring Anatomical Landmark Localization. In Proc. Med. Image Com- put. Comput. Interv., pages 221{229. Springer, 2016. doi: 10.1007/978-3-319-46723-8 26. Jonathan Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation. In Adv. Neural Inf. Process. Syst., pages 1799{1807, 2014. Martin Urschler, Thomas Ebner, and Darko Stern. Integrating Geometric Con guration and Appearance Information into a Uni ed Framework for Anatomical Landmark Local- ization. Med. Image Anal., 43:23{36, jan 2018. doi: 10.1016/j.media.2017.09.003. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Electrical Engineering and Systems Science arXiv (Cornell University)

Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization

Loading next page...
 
/lp/arxiv-cornell-university/integrating-spatial-configuration-into-heatmap-regression-based-cnns-0JTtCjj9yH

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

ISSN
1361-8415
eISSN
ARCH-3348
DOI
10.1016/j.media.2019.03.007
Publisher site
See Article on Publisher Site

Abstract

In many medical image analysis applications, often only a limited amount of training data is available, which makes training of convolutional neural networks (CNNs) challenging. In this work on anatomical landmark localization, we propose a CNN architecture that learns to split the localization task into two simpler sub-problems, reducing the need for large training datasets. Our fully convolutional SpatialCon guration-Net (SCN) dedicates one component to locally accurate but ambiguous candidate predictions, while the other component improves robustness to ambiguities by incorporating the spatial con guration of landmarks. In our experimental evaluation, we show that the proposed SCN outperforms related methods in terms of landmark localization error on size-limited datasets. Keywords: anatomical landmarks, localization, heatmap regression, spatial con guration 1. Introduction Localization of anatomical landmarks is an important step in medical image analysis, e.g., in segmentation (Beichel et al., 2005), or registration (Johnson and Christensen, 2002). Unfortunately, locally similar structures often introduce diculties due to ambiguity into landmark localization. To deal with these diculties, machine learning based approaches often combine local landmark predictions with explicit handcrafted graphical models, aiming to restrict predictions to feasible spatial con gurations. Thus, the landmark localization problem is simpli ed by separating the task into two successive steps. The rst step is dedicated to locally accurate but potentially ambiguous predictions, while in the second step graphical models (Cootes et al., 1995; Felzenszwalb and Huttenlocher, 2005) eliminate ambiguities. Recent advances in computer vision and medical imaging have mainly been driven by convolutional neural networks (CNNs) due to their superior capabilities to automatically learn important image features (LeCun et al., 2015). Unfortunately, CNNs typically need large amounts of training data. Especially in medical imaging, this requirement is hard to ful ll, due to ethical and nancial concerns as well as time consuming expert annotations. In this work, we show that the amount of required training data can be reduced with our proposed two-component SpatialCon guration-Net (SCN), which follows the idea of handcrafted graphical models to split landmark localization into two successive steps. This c 2019 C. Payer, D. Stern, H. Bischof & M. Urschler. arXiv:1908.00748v1 [eess.IV] 2 Aug 2019 Payer Stern Bischof Urschler SCN REGRESSION OBJECTIVE Local Spatial Input Prediction Target Landmarks Appearance Configuration Figure 1: Landmark localization by regressing heatmaps for each landmark in our end-to- end trained fully convolutional SpatialCon guration-Net (SCN). extended abstract gives a short overview of the key concepts of our journal paper pub- lished in (Payer et al., 2019), while we refer the reader to the full paper for more detailed descriptions and more extensive evaluations on a variety of datasets. 2. Method Our method for landmark localization is based on regressing heatmap images (Tompson et al., 2014), which encode the pseudo-probability of a landmark being located at a cer- tain pixel position. With N being the total number of landmarks, we de ne the target heatmap image of a landmark L , i = f1; :::; Ng as the d-dimensional Gaussian function d d g (x) : R ! R centered at the target landmark's groundtruth coordinate x 2 R . i i The network is set up to regress N heatmaps simultaneously by minimizing the di er- ences between predicted heatmaps h (x) and the corresponding target heatmaps g (x) in an i i end-to-end manner (Ronneberger et al., 2015; Shelhamer et al., 2017). In network inference, we obtain the predicted coordinate x ^ 2 R of each landmark L by taking the coordinate, i i where the heatmap has its highest value. 2.1. SpatialCon guration-Net The fundamental concept of the SpatialCon guration-Net (SCN) is the interaction between its two components (see Fig. 1). The rst component takes the image as input to generate LA locally accurate but potentially ambiguous local appearance heatmaps h (x). Motivated by handcrafted graphical models for eliminating these potential ambiguities, the second LA component takes the predicted candidate heatmaps h (x) as input to generate inaccurate SC but unambiguous spatial con guration heatmaps h (x). For N landmarks, the set of predicted heatmaps H = fh (x) j i = 1 : : : Ng is obtained LA SC by element-wise multiplication of the corresponding heatmap outputs h (x) and h (x) i i of the two components: LA SC h (x) = h (x) h (x): (1) i i This multiplication is crucial for the SCN, as it forces both of its components to generate LA SC a response on the location of the target landmark x , i.e., both h (x) and h (x) deliver i i 2 SpatialConfiguration-Net for Landmark Localization 100 100 80 80 60 60 SCN SCN Localization U-Net SCN-100 Payer et al. (2016) 40 40 SCN-50 Urschler et al. (2018) SCN-10 Stern et al. (2016a) U-Net-100 20 20 Ebner et al. (2014) U-Net-50 Lindner et al. (2015) U-Net-10 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 2 IPE in mm IPE in mm (a ) all training images (b) 100/50/10 training images Figure 2: Cumulative distributions of the point-to-point error for 895 radiographs. (a) shows results compared with other state-of-the-art methods. (b) shows results of SCN and localization U-Net for reduced numbers of training images. responses for x close to x , while on all other locations one component may have a response as long as the other one does not have one. 3. Experiments and Results We evaluate our proposed SCN on a dataset of 895 radiographs of left hands with 37 annotated characteristic landmarks on nger tips and bone joints. We compare our SCN to state-of-the-art random regression forests (Ebner et al., 2014; Lindner et al., 2015; Stern et al., 2016; Urschler et al., 2018), our previous CNN-based method of (Payer et al., 2016), and our implementation of a localization U-Net for heatmap regression. Results of the image-speci c point-to-point errors for three-fold cross validation of the 895 radiographs are shown in Fig. 2. When using all training images, our SCN outperforms all other compared methods. Additionally, when drastically reducing the number of training images to 100, 50, and 10, respectively, our SCN greatly outperforms the localization U-Net. This con rms that splitting the localization task into predicting accurate but potentially ambiguous local appearance heatmaps and inaccurate but unambiguous spatial con guration heatmaps is especially useful when dealing with only limited amounts of training data. 4. Conclusion In conclusion, we have shown how to combine information of local appearance and spa- tial con guration into a single end-to-end trained network for landmark localization. Our generic architecture achieves state-of-the-art results in terms of localization error, even when only limited amounts of training images are available. We are currently looking into extend- ing our SCN regarding occluded structures and multi-object localization, and into adapting our SCN for semantic segmentation problems (see (Payer et al., 2018) for preliminary re- sults), where structural constraints may be used in a similar manner. number of images in % number of images in % Payer Stern Bischof Urschler References Reinhard Beichel, Horst Bischof, Franz Leberl, and Milan Sonka. Robust Active Appearance Models and Their Application to Medical Image Analysis. IEEE Trans. Med. Imaging, 24(9):1151{1169, sep 2005. doi: 10.1109/TMI.2005.853237. Tim F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. Active Shape Models-Their Training and Application. Comput. Vis. Image Underst., 61(1):38{59, jan 1995. doi: 10.1006/cviu.1995.1004. Thomas Ebner, Darko Stern, Ren e Donner, Horst Bischof, and Martin Urschler. Towards Automatic Bone Age Estimation from MRI: Localization of 3D Anatomical Landmarks. In Proc. Med. Image Comput. Comput. Interv., pages 421{428. Springer, 2014. doi: 10.1007/978-3-319-10470-6 53. Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Pictorial Structures for Object Recogni- tion. Int. J. Comput. Vis., 61(1):55{79, 2005. doi: 10.1023/B:VISI.0000042934.15159.49. Hans J. Johnson and Gary E. Christensen. Consistent Landmark and Intensity-Based Image Registration. IEEE Trans. Med. Imaging, 21(5):450{461, 2002. doi: 10.1109/TMI.2002. Yann LeCun, Yoshua Bengio, and Geo rey Hinton. Deep Learning. Nature, 521(7553): 436{444, 2015. doi: 10.1038/nature14539. Claudia Lindner, Paul A. Bromiley, Mircea C. Ionita, and Tim F. Cootes. Robust and Accu- rate Shape Model Matching Using Random Forest Regression-Voting. IEEE Trans. Pat- tern Anal. Mach. Intell., 37(9):1862{1874, sep 2015. doi: 10.1109/TPAMI.2014.2382106. Christian Payer, Darko Stern, Horst Bischof, and Martin Urschler. Regressing Heatmaps for Multiple Landmark Localization Using CNNs. In Proc. Med. Image Comput. Comput. Interv., pages 230{238. Springer, 2016. doi: 10.1007/978-3-319-46723-8 27. Christian Payer, Darko Stern, Horst Bischof, and Martin Urschler. Multi-label Whole Heart Segmentation Using CNNs and Anatomical Label Con gurations. In Stat. Atlases Comput. Model. Hear. ACDC MMWHS Challenges. STACOM 2017., pages 190{198. Springer, 2018. doi: 10.1007/978-3-319-75541-0 20. Christian Payer, Darko Stern, Horst Bischof, and Martin Urschler. Integrating Spatial Con guration into Heatmap Regression Based CNNs for Landmark Localization. Med. Image Anal., 54:207{219, may 2019. doi: 10.1016/j.media.2019.03.007. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proc. Med. Image Comput. Comput. Interv., pages 234{241. Springer, 2015. doi: 10.1007/978-3-319-24574-4 28. Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39(4):640{651, apr 2017. doi: 10.1109/TPAMI.2016.2572683. 4 SpatialConfiguration-Net for Landmark Localization Darko Stern, Thomas Ebner, and Martin Urschler. From Local to Global Random Regres- sion Forests: Exploring Anatomical Landmark Localization. In Proc. Med. Image Com- put. Comput. Interv., pages 221{229. Springer, 2016. doi: 10.1007/978-3-319-46723-8 26. Jonathan Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation. In Adv. Neural Inf. Process. Syst., pages 1799{1807, 2014. Martin Urschler, Thomas Ebner, and Darko Stern. Integrating Geometric Con guration and Appearance Information into a Uni ed Framework for Anatomical Landmark Local- ization. Med. Image Anal., 43:23{36, jan 2018. doi: 10.1016/j.media.2017.09.003.

Journal

Electrical Engineering and Systems SciencearXiv (Cornell University)

Published: Aug 2, 2019

References