Access the full text.
Sign up today, get DeepDyve free for 14 days.
Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally (2017)
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed TrainingArXiv, abs/1712.01887
Geoffrey Burr, R. Shelby, Severin Sidler, C. Nolfo, Junwoo Jang, I. Boybat, Rohit Shenoy, P. Narayanan, K. Virwani, E. Giacometti, B. Kurdi, Hyunsang Hwang (2015)
Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element2014 IEEE International Electron Devices Meeting
Benjamin Bowman, Guido Montúfar (2022)
Spectral Bias Outside the Training Set for Deep Networks in the Kernel RegimeArXiv, abs/2206.02927
Mu Li, D. Andersen, Alex Smola, Kai Yu (2014)
Communication Efficient Distributed Machine Learning with the Parameter Server
Rubing Yang, J. Mao, P. Chaudhari (2021)
Does the Data Induce Capacity Control in Deep Learning?ArXiv, abs/2110.14163
R. Schreiber, C. Loan (1989)
A Storage-Efficient $WY$ Representation for Products of Householder TransformationsSiam Journal on Scientific and Statistical Computing, 10
(2018)
Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small chip
[ (2021)
A 3-D reconfigurable RRAM crossbar inference engineProceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’21).
Vittorio Mazzia, Francesco Salvetti, M. Chiaberge (2021)
Efficient-CapsNet: capsule network with self-attention routingScientific Reports, 11
Abdulkadir Canatar, Blake Bordelon, C. Pehlevan (2020)
Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networksNature Communications, 12
Yutong Gao, Shang Wu, G. Adam (2020)
Batch Training for Neuromorphic Systems with Device Non-idealitiesInternational Conference on Neuromorphic Systems 2020
Bin Yang (1995)
An extension of the PASTd algorithm to both rank and subspace trackingIEEE Signal Process. Lett., 2
M. Bishop, H. Wong, S. Mitra, M. Shulaker (2019)
Monolithic 3-D IntegrationIEEE Micro, 39
L. Ceze, J. Hasler, K. Likharev, J.-s. Seo, T. Sherwood, D. Strukov, Y. Xie, S. Yu (2016)
Nanoelectronic neurocomputing: Status and prospects2016 74th Annual Device Research Conference (DRC)
Peng Liu, Zhiqiang You, Jigang Wu, Michael Elimu, Weizheng Wang, Shuo Cai, Yinhe Han (2021)
Defect Analysis and Parallel Testing for 3D Hybrid CMOS-Memristor MemoryIEEE Transactions on Emerging Topics in Computing, 9
F. Merrikh-Bayat, M. Prezioso, B. Chakrabarti, I. Kataeva, D. Strukov (2017)
Memristor-based perceptron classifier: Increasing complexity and coping with imperfect hardware2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
P. Lin, P. Lin, Can Li, Zhongrui Wang, Yunning Li, Hao Jiang, Wenhao Song, Mingyi Rao, Y. Zhuo, N. Upadhyay, Mark Barnell, Qing Wu, J. Yang, Q. Xia (2020)
Three-dimensional memristor circuits as complex neural networksNature Electronics, 3
Shiming Ge, Zhao Luo, Shengwei Zhao, Xin Jin, Xiaoyu Zhang (2017)
Compressing deep neural networks for efficient visual inference2017 IEEE International Conference on Multimedia and Expo (ICME)
J. McKinstry, Steven Esser, R. Appuswamy, Deepika Bablani, J. Arthur, Izzet Yildiz, D. Modha (2019)
Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Inference2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS)
G. Adam, B. Hoskins, M. Prezioso, F. Merrikh-Bayat, B. Chakrabarti, D. Strukov (2017)
3-D Memristor Crossbars for Analog and Neuromorphic Computing ApplicationsIEEE Transactions on Electron Devices, 64
[ (2019)
Towards the development of analog neuromorphic chip prototype with 2Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). IEEE, 2019
Linnan Wang, Wei Wu, Junyu Zhang, Hang Liu, G. Bosilca, M. Herlihy, Rodrigo Fonseca (2020)
FFT-based Gradient Sparsification for the Distributed Training of Deep Neural NetworksProceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing
Chun-Liang Li, Hsuan-Tien Lin, Chi-Jen Lu (2015)
Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, J. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. Henighan, Rewon Child, A. Ramesh, Daniel Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, S. Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei (2020)
Language Models are Few-Shot LearnersArXiv, abs/2005.14165
P. Strobach (1997)
Bi-iteration SVD subspace tracking algorithmsIEEE Trans. Signal Process., 45
[ (2018)
History PCA: A new algorithm for streaming PCAarXiv preprint arXiv:1802.05447 (2018).
Maria-Florina Balcan, S. Du, Yining Wang, A. Yu (2016)
An Improved Gap-Dependency Analysis of the Noisy Power MethodArXiv, abs/1602.07046
S. Ambrogio, P. Narayanan, H. Tsai, R. Shelby, I. Boybat, C. Nolfo, Severin Sidler, M. Giordano, Martina Bodini, Nathan Farinha, Benjamin Killeen, Christina Cheng, Yassine Jaoudi, G. Burr (2018)
Equivalent-accuracy accelerated neural-network training using analogue memoryNature, 558
Moritz Hardt, Eric Price (2013)
The Noisy Power Method: A Meta Algorithm with Applications
Thijs Vogels, Sai Karimireddy, Martin Jaggi (2019)
PowerSGD: Practical Low-Rank Gradient Compression for Distributed OptimizationArXiv, abs/1905.13727
I. Kataeva, S. Ohtsuka, H. Nili, Hyungjin Kim, Yoshihiko Isobe, Koichi Yako, D. Strukov (2019)
Towards the Development of Analog Neuromorphic Chip Prototype with 2.4M Integrated Memristors2019 IEEE International Symposium on Circuits and Systems (ISCAS)
Noah Golmant, N. Vemuri, Z. Yao, Vladimir Feinberg, A. Gholami, Kai Rothauge, Michael Mahoney, Joseph Gonzalez (2018)
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient DescentArXiv, abs/1811.12941
[ (2019)
Kafnets: Kernel-based non-parametric activation functions for neural networksNeural Networks 110 (2019), 110
Abdullah Ash-Saki, Mohammad Khan, Swaroop Ghosh (2021)
Reconfigurable and Dense Analog Circuit Design Using Two Terminal Resistive MemoryIEEE Transactions on Emerging Topics in Computing, 9
E. Oja (1992)
Principal components, minor components, and linear neural networksNeural Networks, 5
Albert Gural, P. Nadeau, M. Tikekar, B. Murmann (2020)
Low-Rank Training of Deep Neural Networks for Emerging Memory TechnologyArXiv, abs/2009.03887
[ (2017)
Compressing deep neural networks for efficient visual inferenceProceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME’17). IEEE, 2017
Alexander Novikov, D. Podoprikhin, A. Osokin, D. Vetrov (2015)
Tensorizing Neural Networks
G. Burr, S. Ambrogio, P. Narayanan, H. Tsai, C. Mackin, An Chen (2019)
Accelerating Deep Neural Networks with Analog Memory Devices2019 China Semiconductor Technology International Conference (CSTIC)
Christopher Shallue, Jaehoon Lee, J. Antognini, Jascha Sohl-Dickstein, Roy Frostig, George Dahl (2018)
Measuring the Effects of Data Parallelism on Neural Network TrainingArXiv, abs/1811.03600
G. Burr, R. Shelby, Severin Sidler, C. Nolfo, Junwoo Jang, I. Boybat, R. Shenoy, P. Narayanan, K. Virwani, E. Giacometti, B. Kurdi, H. Hwang (2014)
Experimental Demonstration and Tolerancing of a Large-Scale Neural Network (165 000 Synapses) Using Phase-Change Memory as the Synaptic Weight ElementIEEE Transactions on Electron Devices, 62
[ (2018)
Capacitor-based cross-point array for analog neural network with record symmetry and linearityProceedings of the 2018 IEEE Symposium on VLSI Technology. IEEE, 2018
B. Hoskins, M. Daniels, Siyuan Huang, A. Madhavan, G. Adam, N. Zhitenev, J. McClelland, M. Stiles (2019)
Streaming Batch Eigenupdates for Hardware Neural NetworksFrontiers in Neuroscience, 13
(1950)
An iterative method for the solution of the eigenvalue problem of linear differential and integral
B. Chakrabarti, M. Lastras-Montaño, G. Adam, M. Prezioso, B. Hoskins, K.-T. Cheng, D. Strukov (2017)
A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuitScientific Reports, 7
Linghao Song, Xuehai Qian, Hai Li, Yiran Chen (2017)
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)
M. Prezioso, F. Merrikh-Bayat, B. Hoskins, G. Adam, K. Likharev, D. Strukov (2014)
Training and operation of an integrated neuromorphic network based on metal-oxide memristorsNature, 521
J. Eshraghian, Kyoungrok Cho, S. Kang (2021)
CrossStack: A 3-D Reconfigurable RRAM Crossbar Inference EngineArXiv, abs/2102.06536
A. Krizhevsky (2009)
Learning Multiple Layers of Features from Tiny Images
[ (2021)
Does the data induce capacity control in deep learning? In Proceedings of the International Conference on Machine Learning
Shuang Pi, Can Li, Hao Jiang, Weiwei Xia, H. Xin, J. Yang, Q. Xia (2018)
Memristor crossbar arrays with 6-nm half-pitch and 2-nm critical dimensionNature Nanotechnology, 14
Avrim Blum, J. Hopcroft, R. Kannan (2020)
Foundations of Data Science
Samuel Smith, Pieter-Jan Kindermans, Quoc Le (2017)
Don't Decay the Learning Rate, Increase the Batch SizeArXiv, abs/1711.00489
A. Kawahara, K. Kawai, Y. Ikeda, Y. Katoh, R. Azuma, Y. Yoshimoto, K. Tanabe, Zhiqiang Wei, T. Ninomiya, K. Katayama, R. Yasuhara, S. Muraoka, A. Himeno, Naoki Yoshikawa, H. Murase, K. Shimakawa, T. Takagi, T. Mikawa, K. Aono (2013)
Filament scaling forming technique and level-verify-write scheme with endurance over 107 cycles in ReRAM2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers
T. Xiao, C. Bennett, Ben Feinberg, S. Agarwal, M. Marinella (2020)
Analog architectures for neural network acceleration based on non-volatile memoryApplied physics reviews, 7
[ (2019)
Characterizing deep learning training workloads on Alibaba-PAIarXiv preprint arXiv:1910.05930 (2019).
Zhen Chen, Zhibo Chen, Jianxin Lin, Sen Liu, Weiping Li (2020)
Deep Neural Network Acceleration Based on Low-Rank Approximated Channel PruningIEEE Transactions on Circuits and Systems I: Regular Papers, 67
R. Mayer, H. Jacobsen (2019)
Scalable Deep Learning on Distributed InfrastructuresACM Computing Surveys (CSUR), 53
W. Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li (2017)
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
Zhongrui Wang, Can Li, Can Li, P. Lin, Mingyi Rao, Yongyang Nie, Wenhao Song, Qinru Qiu, Yunning Li, Peng Yan, J. Strachan, Ning Ge, N. McDonald, Qing Wu, Miao Hu, Huaqiang Wu, R. Williams, Q. Xia, J. Yang (2019)
In situ training of feed-forward and recurrent convolutional memristor networksNature Machine Intelligence, 1
Zeyuan Allen-Zhu, Yuanzhi Li (2016)
First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
Zehui Lin, Pengfei Liu, Luyao Huang, Junkun Chen, Xipeng Qiu, Xuanjing Huang (2019)
DropAttention: A Regularization Method for Fully-Connected Self-Attention NetworksArXiv, abs/1907.11065
[ (2020)
Scalable deep learning on distributed infrastructures: Challenges, techniques, and toolsACM Computing Surveys, 53
Dong Liu, J. Nocedal (1989)
On the limited memory BFGS method for large scale optimizationMathematical Programming, 45
Saurabh Goyal, Anamitra Choudhury, Vivek Sharma (2019)
Compression of Deep Neural Networks by Combining Pruning and Low Rank Decomposition2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
E. Fuller, S. Keene, A. Melianas, Zhongrui Wang, S. Agarwal, Yiyang Li, Y. Tuchman, C. James, M. Marinella, J. Yang, A. Salleo, A. Talin (2019)
Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computingScience, 364
Student Suri, Member Querlioz, O. Bichler, Giorgio Palma, Elisa Vianello, Dominique Vuillaume, C. Gamrat, B. Desalvo (2013)
Bio-Inspired Stochastic Computing Using Binary CBRAM SynapsesIEEE Transactions on Electron Devices, 60
Bin Yang (1995)
An extension of the PASTd algorithm to both rank and subspace trackingIEEE Signal Processing Letters, 2
E. Oja, J. Karhunen (1985)
On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrixJournal of Mathematical Analysis and Applications, 106
Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei Lin, Yangqing Jia (2019)
Characterizing Deep Learning Training Workloads on Alibaba-PAI2019 IEEE International Symposium on Workload Characterization (IISWC)
Shimeng Yu, Wonbo Shim, Xiaochen Peng, Yandong Luo (2021)
RRAM for Compute-in-Memory: From Inference to TrainingIEEE Transactions on Circuits and Systems I: Regular Papers, 68
Naiyang Guan, D. Tao, Zhigang Luo, B. Yuan (2012)
Online Nonnegative Matrix Factorization With Robust Stochastic ApproximationIEEE Transactions on Neural Networks and Learning Systems, 23
G. Adam, A. Khiat, T. Prodromakis (2018)
Challenges hindering memristive neuromorphic hardware from going mainstreamNature Communications, 9
Andrzej Chrzeszczyk, Jan Kochanowski (2011)
Matrix Computations
[ (1950)
An iteration method for the solution of the eigenvalue problem of linear differential and integral operatorsJournal of Research of the National Bureau of Standards, 45
Nasim Rahaman, A. Baratin, Devansh Arpit, Felix Dräxler, Min Lin, F. Hamprecht, Yoshua Bengio, Aaron Courville (2018)
On the Spectral Bias of Neural Networks
Ioannis Mitliagkas, C. Caramanis, Prateek Jain (2013)
Memory Limited, Streaming PCAArXiv, abs/1307.0032
E. Oja (1982)
Simplified neuron model as a principal component analyzerJournal of Mathematical Biology, 15
Mu Li, D. Andersen, J. Park, Alex Smola, Amr Ahmed, V. Josifovski, James Long, E. Shekita, Bor-Yiing Su (2014)
Scaling Distributed Machine Learning with the Parameter Server
Simona Boboila, Peter Desnoyers (2010)
Write Endurance in Flash Drives: Measurements and Analysis
D. Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, Yupeng Gao (2018)
Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models
Gökmen Tayfun, Y. Vlasov (2016)
Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design ConsiderationsFrontiers in Neuroscience, 10
Nitish Srivastava, Geoffrey Hinton, A. Krizhevsky, Ilya Sutskever, R. Salakhutdinov (2014)
Dropout: a simple way to prevent neural networks from overfittingJ. Mach. Learn. Res., 15
[ (2020)
Accelerating deep neural networks with analog memory devicesProceedings of the International Conference on Artificial Intelligence Circuits and Systems (AICAS’20). IEEE
Simone Scardapane, S. Vaerenbergh, Simone Totaro, A. Uncini (2019)
Kafnets: Kernel-based non-parametric activation functions for neural networksNeural networks : the official journal of the International Neural Network Society, 110
[ (2017)
Memristor-based perceptron classifier: Increasing complexity and coping with imperfect hardwareProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17).549–554.
Puyudi Yang, Cho-Jui Hsieh, Jane-ling Wang (2018)
History PCA: A New Algorithm for Streaming PCAarXiv: Machine Learning
[ (2017)
First efficient convergence for streaming k-PCA: A global, gap-free, and near-optimal rateProceedings of the 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS’17). IEEE Los Alamitos, 2017
Norman Jouppi, C. Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Taraneh Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Ho, Doug Hogberg, John Hu, R. Hundt, Dan Hurt, Julian Ibarz, A. Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, R. Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, Doe Yoon (2017)
In-datacenter performance analysis of a tensor processing unit2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
Y. Li, Seyoung Kim, Xiao Sun, Paul Solomon, T. Gokmen, Hsinyu Tsai, Siyu Koswatta, Zhibin Ren, Renee Mo, Chun-Chen Yeh, Wilfried Haensch, Effendi Leobandung (2018)
Capacitor-based Cross-point Array for Analog Neural Network with Record Symmetry and Linearity2018 IEEE Symposium on VLSI Technology
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin (2017)
Attention is All you Need
M. Imani, Daniel Peroni, Abbas Rahimi, T. Rosing (2019)
Resistive CAM Acceleration for Tunable Approximate ComputingIEEE Transactions on Emerging Topics in Computing, 7
[ (2016)
Nanoelectronic neurocomputing: Status and prospectsProceedings of the 74th Annual Device Research Conference (DRC’16).
Jia Deng, Wei Dong, R. Socher, Li-Jia Li, K. Li, Li Fei-Fei (2009)
ImageNet: A large-scale hierarchical image database2009 IEEE Conference on Computer Vision and Pattern Recognition
[ (2012)
ImageNet classification with deep convolutional neural networksAdvances in Neural Information Processing Systems (NIPS’12)
Blake Bordelon, Abdulkadir Canatar, C. Pehlevan (2020)
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
K. Boahen (2017)
A Neuromorph's ProspectusComputing in Science & Engineering, 19
Priya Goyal, Piotr Dollár, Ross Girshick, P. Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He (2017)
Accurate, Large Minibatch SGD: Training ImageNet in 1 HourArXiv, abs/1706.02677
A. Krizhevsky, Ilya Sutskever, Geoffrey Hinton (2012)
ImageNet classification with deep convolutional neural networksCommunications of the ACM, 60
The movement of large quantities of data during the training of a deep neural network presents immense challenges for machine learning workloads, especially those based on future functional memories deployed to store network models. As the size of network models begins to vastly outstrip traditional silicon computing resources, functional memories based on flash, resistive switches, magnetic tunnel junctions, and other technologies can store these new ultra-large models. However, new approaches are then needed to minimize hardware overhead, especially on the movement and calculation of gradient information that cannot be efficiently contained in these new memory resources. To do this, we introduce streaming batch principal component analysis (SBPCA) as an update algorithm. Streaming batch principal component analysis uses stochastic power iterations to generate a stochastic rank-k approximation of the network gradient. We demonstrate that the low-rank updates produced by streaming batch principal component analysis can effectively train convolutional neural networks on a variety of common datasets, with performance comparable to standard mini-batch gradient descent. Our approximation is made in an expanded vector form that can efficiently be applied to the rows and columns of crossbars for array-level updates. These results promise improvements in the design of application-specific integrated circuits based around large vector-matrix multiplier memories.
ACM Journal on Emerging Technologies in Computing Systems (JETC) – Association for Computing Machinery
Published: May 18, 2023
Keywords: Deep learning
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.