Site Reliability Engineering (Leadership Level) Ownership of platform reliability resiliency and performance Definition and governance of:
o SLIs SLOs SLAs
o Error budgets and reliability metrics
Advanced observability strategy designing and implementation:
o Metrics logs traces alerts dashboards using Dynatrace
Incident response leadership RCA facilitation and long-term remediation planning Experience operating 99.9% 99.99% availability systems
Containers APIs & Integration
Leadership-level experience with AKS-based platforms ingress and scaling strategies
Understanding of microservices API-led and event-driven architectures
Familiarity with Azure Integration Services (Service Bus Event Hub API Management)
Security Compliance & Cost
Secure cloud design using Key Vault managed identities RBAC
Cost optimization (FinOps mindset) across cloud infrastructure
Roles & Responsibilities
Act as Lead SRE for clients Retail platforms owning reliability and stability outcomes
Define and enforce SRE standards best practices and operating models
Architect and govern highly available scalable cloud platforms
Lead the design and implementation of CI/CD and IaC strategies
Establish proactive monitoring alerting and incident prevention mechanisms
O wn major incident leadership RCA execution and corrective action tracking
Partner with application security and architecture teams to build reliability by design
Drive automation to reduce toil and improve operational efficiency
Mentor and coach SRE and DevOps engineers across teams
Influence roadmap decisions with a reliability scalability and cost lens
Desired Skills
Azure DevOps
Kind Regards
Yogesh Kumar
Sr. IT Recruiter Work#:
Mailto:
I hope youre doing well Please review the JD below and let me know if you would be interested in exploring the opportunity. Job Description :- Role:- DevOps & Site Reliability Lead Job Type:- Full-Time with TCS Salary :- $120K to $160K Plus Client Benefits Job Location:- Deerfield ...
I hope youre doing well Please review the JD below and let me know if you would be interested in exploring the opportunity.
Job Description :-
Role:- DevOps & Site Reliability Lead
Job Type:- Full-Time with TCS
Salary :- $120K to $160K Plus Client Benefits
Job Location:- Deerfield IL 60015 Onsite
Job Description
Must Have Technical/Functional Skills
Cloud & Platform Engineering (Expert Level)
Deep expertise in Microsoft Azure including:
o Compute (VMs App Services Azure Container Apps)
o Containers & Orchestration (AKS Docker)
o Networking (VNETs Private Endpoints Application Gateway Load Balancers)
o Storage Azure Key Vault Azure Monitor Log Analytics
Proven experience designing enterprise-grade highly available cloud platforms
Strong understanding of hybrid and multi-cloud architectures (AWS / GCP exposure preferred)
DevOps & Engineering Excellence
Advanced experience with Azure DevOps and CI/CD pipeline architecture
Infrastructure automation using Terraform (modules state management governance)