Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

ENRICHing medical imaging training sets enables more efficient machine learning

ENRICHing medical imaging training sets enables more efficient machine learning ObjectiveDeep learning (DL) has been applied in proofs of concept across biomedical imaging, including across modalities and medical specialties. Labeled data are critical to training and testing DL models, but human expert labelers are limited. In addition, DL traditionally requires copious training data, which is computationally expensive to process and iterate over. Consequently, it is useful to prioritize using those images that are most likely to improve a model’s performance, a practice known as instance selection. The challenge is determining how best to prioritize. It is natural to prefer straightforward, robust, quantitative metrics as the basis for prioritization for instance selection. However, in current practice, such metrics are not tailored to, and almost never used for, image datasets.Materials and MethodsTo address this problem, we introduce ENRICH—Eliminate Noise and Redundancy for Imaging Challenges—a customizable method that prioritizes images based on how much diversity each image adds to the training set.ResultsFirst, we show that medical datasets are special in that in general each image adds less diversity than in nonmedical datasets. Next, we demonstrate that ENRICH achieves nearly maximal performance on classification and segmentation tasks on several medical image datasets using only a fraction of the available images and without up-front data labeling. ENRICH outperforms random image selection, the negative control. Finally, we show that ENRICH can also be used to identify errors and outliers in imaging datasets.ConclusionsENRICH is a simple, computationally efficient method for prioritizing images for expert labeling and use in DL. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of the American Medical Informatics Association Oxford University Press

ENRICHing medical imaging training sets enables more efficient machine learning

Loading next page...
 
/lp/oxford-university-press/enriching-medical-imaging-training-sets-enables-more-efficient-machine-y0zNkg6Hr1

References (27)

Publisher
Oxford University Press
Copyright
© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN
1067-5027
eISSN
1527-974X
DOI
10.1093/jamia/ocad055
Publisher site
See Article on Publisher Site

Abstract

ObjectiveDeep learning (DL) has been applied in proofs of concept across biomedical imaging, including across modalities and medical specialties. Labeled data are critical to training and testing DL models, but human expert labelers are limited. In addition, DL traditionally requires copious training data, which is computationally expensive to process and iterate over. Consequently, it is useful to prioritize using those images that are most likely to improve a model’s performance, a practice known as instance selection. The challenge is determining how best to prioritize. It is natural to prefer straightforward, robust, quantitative metrics as the basis for prioritization for instance selection. However, in current practice, such metrics are not tailored to, and almost never used for, image datasets.Materials and MethodsTo address this problem, we introduce ENRICH—Eliminate Noise and Redundancy for Imaging Challenges—a customizable method that prioritizes images based on how much diversity each image adds to the training set.ResultsFirst, we show that medical datasets are special in that in general each image adds less diversity than in nonmedical datasets. Next, we demonstrate that ENRICH achieves nearly maximal performance on classification and segmentation tasks on several medical image datasets using only a fraction of the available images and without up-front data labeling. ENRICH outperforms random image selection, the negative control. Finally, we show that ENRICH can also be used to identify errors and outliers in imaging datasets.ConclusionsENRICH is a simple, computationally efficient method for prioritizing images for expert labeling and use in DL.

Journal

Journal of the American Medical Informatics AssociationOxford University Press

Published: Apr 10, 2023

Keywords: deep learning; medical imaging; information theory; instance selection; data quality; data efficiency

There are no references for this article.