Our projects: Cortex
Pathway to reliable IT Operations
About the project and product
Cortex is an infrastructure-agnostic container runtime environment powered by Kubernetes OKD 4, extended and customized for the RBI Group's ecosystems that decouples applications from the underlying infrastructure. Designed according to highest security standards following GitOps and ZeroOps principles and developed by the core contributor team, it followed from the very beginning the inner source concept, its community, contributors, and users grew substantially in 2022. Since January 2024 the Cortex Platform team provides Operations of the Cortex Clusters via a Service Offering called “Cortex as a Service”.
Time & tools
Three of the multiple Cortex cluster hubs within RBI Group are considered as main, with significant applications and each one consists of multiple clusters with multiple environments. In time it became hard for the tribes to create, maintain and operate all the clusters themselves. Therefore, we created the site reliability engineering (SRE) team to manage the challenges which came with this accelerated story line while performing top-notch IT operations.
Our approach - how we created the product
We are professionalizing the way how we perform operations, the automation of operations, how we interact with other teams, the whole feedback cycle, the way we treat incidents, and how we fulfil needs. We needed to mature our operational approach and we develop it in a collaborative way with the community. In 2024 it is expected that a few sensitive and very important applications will be hosted within Cortex.
Team collaboration
Our SRE team has 9 members, 4 based in Romania and their counterparts are the platform team (also called Core Incubator team) and the application teams.
The main purpose of the SRE team is to operate and manage Cortex clusters, to automate the operations process and to act like an internal consultant for the application developers giving them feedback on how they can improve and design applications to be easier operated.
Measurable outcomes for customers
- The SRE team provides reliable cluster operations to our customers on a 24/7 basis
- Upgrades, learnings, and best practices are managed and performed on all cluster hubs.
- Application developers and their DevOps engineers can focus on application scope without going too much into runtime details.
- Security and compliance aspects are established on cluster level and kept up to date.
Other products might also interest you
Find here some other example projects and products we currently work on.