Service Reliability Engineer, G&A Solutions Engineering (GSE)

Job Location:

Austin, TX - USA

Monthly Salary: Not Disclosed

Posted on: Yesterday

Vacancies: 1 Vacancy

Job Summary

Do you have a passion for ensuring the reliability scalability and performance of critical services Are you a highly motivated and expert engineer with a strong understanding of Site Reliability Engineering (SRE) principles and a desire to automate and improve processes Join Apples General and Administrative (Gu0026A) Solutions Engineering team as a Service Reliability Engineer and play a vital role in supporting our global critical production systems.n

As a Service Reliability Engineer youll be at the forefront of maintaining the health stability and efficiency of our services working with a diverse range of technologies and platforms. You will collaborate with Engineers Data Engineers DBAs and network specialists to proactively identify and resolve potential issues automate repetitive tasks and drive continuous improvement initiatives. Your expertise will directly impact the reliability of our systems enabling Apple to deliver innovative products and services to our customers.n

Proactively monitor service performance identify potential bottlenecks and implement solutions to optimize efficiency and resiliencennLead incident response efforts driving rapid resolution and conducting thorough root cause analysis (RCA)nnDevelop and implement automation strategies to streamline operational tasks improve service resilience and reduce manual interventionnnApply SRE principles to maintain highly reliable and scalable service infrastructurennCollaborate closely with development teams to ensure that new services are designed for operational excellence incorporating best practices for monitoring alerting and scalabilitynnContribute to the creation and maintenance of comprehensive documentation including run-books service level objectives (SLOs)nnParticipate in on-call rotations providing 24/7 support for critical services and responding to incidents with a sense of urgencynnIdentify opportunities for process improvement and drive initiatives to enhance the efficiency and effectiveness of the service reliability teamnnChampion a culture of continuous learning and knowledge sharing within the teamnnDefine and track key service level indicators (SLIs) and service level objectives (SLOs) to measure and improve service reliabilityn

3 years of experience in a Site Reliability Engineering DevOps or related role supporting large-scale enterprise-level proficiency in at least one programming language (e.g. Python Java Go) and scripting languages (e.g. Bash PowerShell)nnExperience with cloud platforms (e.g. AWS Azure GCP) and cloud-native technologies (e.g. Kubernetes Docker).nnHands-on experience with monitoring and alerting tools (e.g. Prometheus Grafana Splunk Data dog)nnExperience in RCA of technical issuesnnBachelors degree in Computer Science or work related experiencen

Proven ability to troubleshoot complex issues in distributed systemsnnFamiliarity with CI/CD pipelines and DevOps practicesnnExperience with database technologies (e.g. MySQL PostgreSQL NoSQL databases)nnKnowledge of ITIL frameworks and incident management processesnnUnderstanding of Linux/Unix system administrationnnExperience with configuration management tools (Ansible Chef Puppet)n

Required Experience:

Apply Now

About Company

Apple

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click