This job is in your area. Enjoy a short commute and work close to home.
Job Description
Site Reliability Engineers are responsible for ensuring the availability, reliability, scalability, and performance of the firmβs most critical, customer-facing microservices that power all eCommerce channels. This role appliesβ―Google-inspired SRE principles to balance feature velocity and system reliability using Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets.
The role combines software engineering, cloud engineering, automation, and production operations, with a strong emphasis on building systems that are observable, resilient, and operable by default.
Primary Responsibilities
Define, implement, and own SLIs, SLOs, and error budgets for critical microservices in collaboration with product and engineering teams.
Use error budgets to influence release decisions, prioritize reliability work, and manage operational risk.
Design and maintain observability platforms including metrics, logs, traces,...