Senior Site Reliability Engineer (SRE)
publiée le 6/1/2026 11:17:58 AM
At Core42, we’re building next-generation sovereign AI and cloud infrastructure platforms that power large-scale AI, machine learning, and HPC workloads globally. We’re looking for a Senior Site Reliability Engineer to help design, automate, and operate highly scalable, secure, and resilient infrastructure supporting mission-critical AI platforms. What you’ll work on: Kubernetes operations and platform reliability; CI/CD pipelines using GitLab CI, Azure DevOps, and Jenkins; Infrastructure as Code with Terraform, Helm, and Ansible; Observability and monitoring with Prometheus, Grafana, VictoriaMetrics, and ELK/EFK; Automation, incident management, and reliability engineering; Secure cloud-native platform operations at scale; AI/ML and HPC infrastructure environments. We’re looking for engineers with: Strong Kubernetes production experience (AKS, EKS, GKE, or self-managed); Deep understanding of SRE, DevOps, and platform engineering principles; Expertise in CI/CD, automation, and Infrastructure as Code; Experience with Docker, container orchestration, and Linux systems; Strong scripting/programming skills (Python, Bash, Go); Hands-on experience with observability and monitoring stacks; Passion for scalability, reliability, and automation. Bonus points: AI/ML or HPC production environment experience; GPU infrastructure and workload optimization knowledge; Experience supporting distributed systems at scale. Why join Core42? Full relocation support for international candidates; Competitive salary and benefits; Premium family medical coverage; Opportunity to work on cutting-edge AI and sovereign cloud infrastructure. If you’re passionate about SRE, Kubernetes, AI infrastructure, and building resilient platforms at scale — we’d love to connect. Apply now or reach out directly to learn more. Abu Dhabi, UAE
Voir cette mission avec l'extension Tarss