Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-k approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Journal on Emerging Technologies in Computing Systems (JETC) Association for Computing Machinery

Low-Rank Gradient Descent for Memory-Efficient Training of Deep In-Memory Arrays

Loading next page...
 
/lp/association-for-computing-machinery/low-rank-gradient-descent-for-memory-efficient-training-of-deep-in-oWJv2tpzkK

References (99)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISSN
1550-4832
eISSN
1550-4840
DOI
10.1145/3577214
Publisher site
See Article on Publisher Site

Abstract

The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-k approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories.

Journal

ACM Journal on Emerging Technologies in Computing Systems (JETC)Association for Computing Machinery

Published: May 18, 2023

Keywords: Deep learning

There are no references for this article.