Apache Spark: Paving The Way For A Future Beyond MapReduce

Naresh Dulam

Authors

Naresh Dulam Vice President Sr Lead Software Engineer, JP Morgan Chase, USA Author

Keywords:

Apache Spark, Big Data Processing, MapReduce, Hadoop

Abstract

More quickly and more adaptably MapReduce replacement Apache Spark altered large-scale data handling. Spark's in-memory computing speeds data access and processing for real-time analytics. Python, Java, and Scala all allow one quickly build sophisticated systems. Spark works well with Hadoop and allows free cost data architecture changes. It offers SQL, graph, and machine learning techniques for more thorough understanding outside of data processing. Big data consumption enhances the performance, simplicity, and creativity of Spark on data processing and supports companies to make faster, better decisions.

References

1. Xin, R. S., Gonzalez, J. E., Franklin, M. J., & Stoica, I. (2013, June). Graphx: A resilient distributed graph system on spark. In First international workshop on graph data management experiences and systems (pp. 1-6).

2. Rapolu, N., Kambatla, K., Jagannathan, S., & Grama, A. (2011). {TransMR}:{Data-Centric} Programming Beyond Data Parallelism. In 3rd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 11).

3. Jakovits, P., Srirama, S. N., & Vainikko, E. (2012). Mapreduce for scientific computing-viability for non-embarrassingly parallel algorithms. In Applications, Tools and Techniques on the Road to Exascale Computing (pp. 117-124). IOS Press.

4. De Bosschere, K. (2012). MapReduce for Scientific Computing. Applications, Tools and Techniques on the Road to Exascale Computing, 22, 117.

5. Cho, B., Rahman, M., Chajed, T., Gupta, I., Abad, C., Roberts, N., & Lin, P. (2013, October). Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in map reduce clusters. In Proceedings of the 4th annual Symposium on Cloud Computing (pp. 1-17).

6. Xu, W., Gong, X., & Li, X. (2012). Map Combine: a lightweight solution to improve the efficiency of iterative MapReduce. In Contemporary Research on E-business Technology and Strategy: International Conference, iCETS 2012, Tianjin, China, August 29-31, 2012, Revised Selected Papers (pp. 444-456). Springer Berlin Heidelberg.

7. Chen, R., & Chen, H. (2013). Tiled-mapreduce: Efficient and flexible mapreduce processing on multicore with tiling. ACM Transactions on Architecture and Code Optimization (TACO), 10(1), 1-30.

8. Guo, Z. (2012). High performance integration of data parallel file systems and computing: Optimizing MapReduce (Doctoral dissertation, Indiana University).

9.Sakr, S., Liu, A., & Fayoumi, A. G. (2013). The family of mapreduce and large-scale data processing systems. ACM Computing Surveys (CSUR), 46(1), 1-44.

10. Jin, H. (2012). System support for resilience in large-scale parallel systems:

From checkpointing to mapreduce. Illinois Institute of Technology.

11. Onizuka, M., Kato, H., Hidaka, S., Nakano, K., & Hu, Z. (2013). Optimization for iterative queries on MapReduce. Proceedings of the VLDB Endowment, 7(4), 241-252

12. Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., ... & Baldeschwieler, E. (2013, October). Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (pp. 1-16).

13. Massie, M., Nothaft, F., Hartl, C., Kozanitis, C., Schumacher, A., Joseph, A. D., & Patterson, D. A. (2013). Adam: Genomics formats and processing patterns for cloud scale computing. University of California, Berkeley Technical Report, No. UCB/EECS-2013, 207, 2013.

14. Liu, G. J., & Goldenberg, A. A. (1991, January). Robust hybrid impedance control of robot manipulators. In Proceedings. 1991 IEEE International Conference on Robotics and Automation (pp. 287-288). IEEE Computer Society.

15. Lin, J. (2013). Mapreduce is good enough? if all you have is a hammer, throw away everything that's not a nail!. Big Data, 1(1), 28-37.

Apache Spark: Paving The Way For A Future Beyond MapReduce

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite