Federated AI on Kubernetes: Orchestrating Secure and Scalable Machine Learning Pipelines

Authors

  • Sai Prasad Veluru Software Engineer at Apple, USA Author
  • Mohan Krishna Manchala ML Engineer at Meta, USA Author

Keywords:

Federated Learning, Kubernetes, AI Pipelines, Data Privacy

Abstract

Emerging as a strong framework for training machine learning models without centralizing sensitive data, federated learning (FL) answers important issues of data privacy, regulatory compliance, and distributed data sources. This paper investigates how federated artificial intelligence with Kubernetes could be controlled to produce scalable, safe, and efficient machine learning pipelines in remote environments. Kubernetes best matches the complex needs of federated learning systems with its container orchestration, resource abstraction, & automation tools; this paper tries to illustrate how. Kubernetes keeps different hardware across edge nodes, dynamically scales computational workloads, & enforces strict security needs to simplify the deployment & maintenance of federated learning systems across a distributed infrastructure. In sectors such as healthcare, banking, and IoT—where sensitive data is challenging to convey or explain—this approach has pragmatic relevance. Important contributions consist of pragmatic insights on the integration of federated learning frameworks with Kubernetes-native tools (such as Kubeflow, KubeEdge, and Helm), the building of secure communication among nodes, and the development of resilient pipelines capable of running smoothly across many contexts. Second nature to readers will be designed federated learning systems that respect data sovereignty, scale well, reduce operating overhead, and follow modern DevOps approaches. Using Kubernetes's features relevant to solution architects, DevOps practitioners, and machine learning engineers, this article offers a practical study on building strong federated learning systems in active usage.

Downloads

Download data is not yet available.

References

Felstaine, Eyal, and Ofer Hermoni. "Machine Learning, Containers, Cloud Natives, and Microservices." Artificial Intelligence for Autonomous Networks. Chapman and Hall/CRC, 2018. 145-164.

Aldinucci, Marco, et al. "HPC4AI: an ai-on-demand federated platform endeavour." Proceedings of the 15th ACM International Conference on Computing Frontiers. 2018.

Prosper, James. "Deploying Scalable Deep Learning Models for Real-Time Customer Insight." (2019).

Prosper, James. "AI-Powered Enterprise Architectures for Omni-Channel Sales: Enhancing Scalability, Security, and Performance." (2018).

Prosper, James. "AI-Powered Enterprise Architectures for Omni-Channel Sales: Enhancing Scalability, Security, and Performance." (2018).

Kumar, Tambi Varun. "CLOUD-NATIVE MODEL DEPLOYMENT FOR FINANCIAL APPLICATIONS." (2015).

Trakadas, Panagiotis, et al. "Hybrid clouds for data-intensive, 5G-enabled IoT applications: An overview, key issues and relevant architecture." Sensors 19.16 (2019): 3591.

Anusha Atluri. “The Revolutionizing Employee Experience: Leveraging Oracle HCM for Self-Service HR”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 7, no. 2, Dec. 2019, pp. 77-90

Prosper, James. "Optimizing Cloud-Native AI Architectures for Seamless Omni-Channel Retail Integration." (2019).

Sharma, Himanshu. "HPC-ENHANCED TRAINING OF LARGE AI MODELS IN THE CLOUD." International Journal of Advanced Research in Engineering and Technology 10.2 (2019): 953-972.

Gade, Pavan Kumar. "MLOps Pipelines for GenAI in Renewable Energy: Enhancing Environmental Efficiency and Innovation." Asia Pacific Journal of Energy and Environment 6.2 (2019): 113-122.

Yasodhara Varma Rangineeni, and Manivannan Kothandaraman. “Automating and Scaling ML Workflows for Large Scale Machine Learning Models”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 6, no. 1, May 2018, pp. 28-41

Ward, David, and Chris Metz. "Role of Open Source, Standards, and Public Clouds in Autonomous Networks." Artificial Intelligence for Autonomous Networks. Chapman and Hall/CRC, 2018. 101-144.

Trindadea, Silvana, Luiz F. Bittencourta, and Nelson LS da Fonsecaa. "Management of Resource at the Network Edge for Federated Learning." (2015).

Rausch, Thomas, and Schahram Dustdar. "Edge intelligence: The convergence of humans, things, and ai." 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2019.

Anusha Atluri. “Data Migration in Oracle HCM: Overcoming Challenges and Ensuring Seamless Transitions”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 7, no. 1, Apr. 2019, pp. 66–80

Ahmadi Mehri, Vida, Dragos Ilie, and Kurt Tutschku. "Designing a secure IoT system architecture from a virtual premise for a collaborative AI lab." Workshop on Decentralized IoT Systems and Security (DISS) 24 February 2019, San Diego, CA,. Internet Society, 2019.

YANG, RENYU, et al. "Orchestrating development lifecycle of machine learning based IoT applications: A survey." (2019).

Downloads

Published

09-03-2021

How to Cite

[1]
S. P. Veluru and M. K. Manchala, “Federated AI on Kubernetes: Orchestrating Secure and Scalable Machine Learning Pipelines”, Essex Journal of AI Ethics and Responsible Innovation, vol. 1, pp. 288–312, Mar. 2021, Accessed: Apr. 16, 2026. [Online]. Available: https://ejaeai.org/index.php/publication/article/view/59