Location
mexico city
Posted
June 07, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
What You’ll Do
Reliability & Operations -
Own availability, latency, and scalability across Saa S and AI systems - Define and enforce SLOs, SLIs, and error budgets - Participate in a global on-call rotation (~1 week every 4 weeks) - Lead incident response and drive blameless postmortems with systemic fixes Platform & Infrastructure - Architect and operate on-premise and multi-region, multi-cloud environments - Manage large-scale Kubernetes workloads - Build and evolve infrastructure using Terraform and Ansible - Improve system resilience, fault isolation, and capacity planning AI/ML & Automation - Build and scale agentic AI systems for triage, anomaly detection, and self-healing - Ensure reliability of model serving infrastructure - Operate, optimize and scale distributed systems What You Bring - 5+ years in SRE
, Production Engineering, or Platform Engineering - Strong experience with cloud providers (AWS/GCP/OCI), Kubernetes, and Ia C (Terraform/Ansible) - Proficiency in Pyt...
Reliability & Operations -
Own availability, latency, and scalability across Saa S and AI systems - Define and enforce SLOs, SLIs, and error budgets - Participate in a global on-call rotation (~1 week every 4 weeks) - Lead incident response and drive blameless postmortems with systemic fixes Platform & Infrastructure - Architect and operate on-premise and multi-region, multi-cloud environments - Manage large-scale Kubernetes workloads - Build and evolve infrastructure using Terraform and Ansible - Improve system resilience, fault isolation, and capacity planning AI/ML & Automation - Build and scale agentic AI systems for triage, anomaly detection, and self-healing - Ensure reliability of model serving infrastructure - Operate, optimize and scale distributed systems What You Bring - 5+ years in SRE
, Production Engineering, or Platform Engineering - Strong experience with cloud providers (AWS/GCP/OCI), Kubernetes, and Ia C (Terraform/Ansible) - Proficiency in Pyt...