An in-depth examination of frameworks such as Apache Kafka and Apache Pulsar for data processing in real time

Authors

  • Muneer Ahmed Salamkar Senior Associate at JP Morgan Chase, USA Author

Keywords:

Real-time data processing, Apache Kafka, analytics, data architecture

Abstract

Business intelligence is being transformed by actual time data processing, which enables the firms to take an action on insights as they are created. At the vanguard of this changes are tools like Apache Kafka & Apache Pulsar, which makes it possible to stream data from various sources at high throughputs & low latency. These frameworks are enabled the companies to monitor patterns, identify irregularities & they react quickly to operational occurrences. Because of its robustness, scalability & the extensive ecosystem of connectors, Apache Kafka is often used to be safely handled massive amounts of events data. It's perfect for adding actual time analytics to Business intelligence capabilities. Apache Pulsar is a great option for businesses with a range of requirements that because of its multi-tenancy & geo-replication data capabilities, which are enables scalable, globally dispersed data streaming. By transforming batch-based insights into continuous, actionable information actual time  processing enhances competitiveness, customer experiences & the decision-making. This method is very crucial for tracking transactions, users behavior & also the other indicators with actual time feedback in sectors including banking, e-commerce & the healthcare. This conversation illustrates how Kafka & Pulsar provide a dynamic Business intelligence environments where quick, precise data leads to better choices by contrasting their designs & the advantages. Although actual time Business intelligence has many benefits, maintaining agility in a data-driven environment requires selecting the appropriate platforms.

References

1. Marcu, O. C. (2018). KerA: A Unified Ingestion and Storage System for Scalable Big Data Processing (Doctoral dissertation, INSA Rennes).

2. Mondal, A. K. (2017). Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data (Doctoral dissertation, University of Saskatchewan).

3. Suresh, L., Bodik, P., Menache, I., Canini, M., & Ciucu, F. (2017, September). Distributed resource management across process boundaries. In Proceedings of the 2017 Symposium on Cloud Computing (pp. 611-623).

4. Vallentin, M. (2016). Scalable network forensics (Doctoral dissertation, UC Berkeley).

5. Estrada, R. (2018). Apache Kafka Quick Start Guide: Leverage Apache Kafka 2.0 to simplify real-time data processing for distributed applications. Packt Publishing Ltd.

6. Lyon, R. J., Stappers, B. W., Levin, L., Mickaliger, M. B., & Scaife, A. (2018). A Processing Pipeline for High Volume Pulsar Data Streams. arXiv preprint arXiv:1810.06012.

7. Quoc, D. L., Chen, R., Bhatotia, P., Fetze, C., Hilt, V., & Strufe, T. (2017). Approximate stream analytics in apache flink and apache spark streaming. arXiv preprint arXiv:1709.02946.

8. Renart, E., Balouek-Thomert, D., & Parashar, M. (2017, September). Pulsar: Enabling dynamic data-driven IoT applications. In 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS* W) (pp. 357-359). IEEE.

9. Antoniadis, J., Freire, P. C., Wex, N., Tauris, T. M., Lynch, R. S., Van Kerkwijk, M. H., ... & Whelan, D. G. (2013). A massive pulsar in a compact relativistic binary. Science, 340(6131), 1233232.

10. Moreira, H. (2016). Integração de Dados de Sensores e Gestão de Ambientes Inteligentes (Master's thesis, Universidade de Aveiro (Portugal)).

11. Kidger, M. (2007). Cosmological Enigmas: Pulsars, Quasars, and Other Deep-Space Questions. JHU Press.

12. Hwang, D. H., & Jeong, Y. K. K. C. S. (2010). REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN ADistributed ENVIRONMENT. In Seventh International Conference on Networks & Communications (pp. 211-218).

13. Chinthapatla, Y. (1924). Integrating ServiceNow with Apache Kafka: Enhancing Real-Time Data Processing.

14. Poladi, S. (1924). Integrating Apache Spark with AWS Lambda: Building Scalable and Real-Time Data Processing Pipelines.

15. Guha, S. (2010). Computing environment for the statistical analysis of large and complex data.

16. Gade, K. R. (2017). Integrations: ETL vs. ELT: Comparative analysis and best practices. Innovative Computer Sciences Journal, 3(1).

17. Gade, K. R. (2017). Migrations: Challenges and Best Practices for Migrating Legacy Systems to Cloud-Based Platforms. Innovative Computer Sciences Journal, 3(1).

18. Gade, K. R. (2018). Real-Time Analytics: Challenges and Opportunities. Innovative Computer Sciences Journal, 4(1).

Published

04-07-2019

How to Cite

[1]
Muneer Ahmed Salamkar, “An in-depth examination of frameworks such as Apache Kafka and Apache Pulsar for data processing in real time”, Distrib. Learn. Broad Appl. Sci. Res., vol. 5, pp. `1037–1058, Jul. 2019, Accessed: Mar. 14, 2025. [Online]. Available: https://dlbasr.org/index.php/publication/article/view/37