Scale Deep Learning Leveraging a Distributed Training Method for Huge Datasets

Sarbaree Mishra

Authors

Sarbaree Mishra Program Manager at Molina Healthcare Inc., USA Author

Keywords:

Deep learning, Big data, Scalability, Training Optimization

Abstract

Deep learning is revolutionizing many fields, hence the challenge of properly training models on large-scale datasets becomes even more important. Conventional training methods can find it difficult to meet the computing and memory capacity needed to manage large amounts of data. This work explores a distributed training approaches using many computing resources to increase the scalability & efficiency. While retaining or improving model performances, partitioning data & parallelizing model training across a computer networks may significantly cut training times. We investigate that key techniques like data & model parallelism, evaluating their advantages & the settings in which they might be most useful. We also address the issues with fault tolerance, communication overhead & synchronizing & provide solutions. Our findings show that distributed training not only accelerates the learning process but also helps to handle datasets that were before unworkable for single-machine learning. The results of this work significantly contribute to deep learning and help to create more sophisticated models able to solve complex problems in many domains. This program aims to help practitioners to fully use their data, therefore promoting innovation and advancement in disciplines such computer vision, natural language processing, and others.

References

1. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A.

(2012). Large scale distributed deep networks. Advances in neural information

processing systems, 25.

2.Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Zheng, X. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.

3. Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., ... & Yu, Y. (2015, August). Petuum: A new platform for distributed machine learning on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1335-1344).

4. Chilimbi, T., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014). Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX symposium on operating systems design and implementation (OSDI 14) (pp. 571-582).

5. Tsang, I. W., Kwok, J. T., Cheung, P. M., & Cristianini, N. (2005). Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6(4).

6. Al-Jarrah, O. Y., Yoo, P. D., Muhaidat, S., Karagiannidis, G. K., & Taha, K. (2015).Efficient machine learning for big data: A review. Big Data Research, 2(3), 87-93.

7. Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, F. (2017, April). Fast

bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics (pp. 528-536). PMLR.

8. Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., & Muharemagic, E. (2015). Deep learning applications and challenges in big data analytics. Journal of big data, 2, 1-21.

9. Le, Q. V. (2013, May). Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8595-8598). IEEE.

10. Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., ... & Zhou, Y. (2017). Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409.

11. Teerapittayanon, S., McDanel, B., & Kung, H. T. (2017, June). Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th

international conference on distributed computing systems (ICDCS) (pp. 328-339). IEEE.

12. Chen, X. W., & Lin, X. (2014). Big data deep learning: challenges and perspectives. IEEE access, 2, 514-525.

13. Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 513-520).

14. Mnih, A., & Gregor, K. (2014, June). Neural variational inference and learning in belief networks. In International Conference on Machine Learning (pp. 1791-1799). PMLR.

15. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1), 1-122.

16. Gade, K. R. (2018). Real-Time Analytics: Challenges and Opportunities. Innovative Computer Sciences Journal, 4(1).

17. Gade, K. R. (2017). Integrations: ETL/ELT, Data Integration Challenges, Integration Patterns. Innovative Computer Sciences Journal, 3(1).

18. Komandla, V. Transforming Financial Interactions: Best Practices for Mobile Banking App Design and Functionality to Boost User Engagement and Satisfaction.

19. Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences Journal, 3(1).

Scale Deep Learning Leveraging a Distributed Training Method for Huge Datasets

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite