Cloud Network Reliability Engineer

Job Location:

Sunnyvale, CA - USA

Monthly Salary: Not Disclosed

Posted on: 13 hours ago

Vacancies: 1 Vacancy

Job Summary

Apple Cloud Networking team builds and operates large-scale software-defined networking platforms that enable secure resilient and highly available multi-cloud connectivity with a global footprint. Our infrastructure powers critical Apple services such as iCloud iTunes Siri and are seeking an experienced and visionary Cloud Network Reliability Engineer to drive the technical strategy and execution for ensuring the availability performance scalability and resiliency of Apples global network this role you will work as a technical leader solving complex networking challenges at massive scale partnering with engineering infrastructure and operations teams across Apple to deliver reliable fault-tolerant systems..

As a technical leader within the Cloud Networking organization you will define and drive the reliability and resiliency architecture for Apples network platform services. You will be responsible for establishing SRE and SWE best practices architecting fault-tolerant network control and data planes and championing data-driven decision-making through observability and will drive resilient cloud networking solutions that operate reliably across multiple cloud providers and global regions handling failures gracefully and maintaining service availability. Your technical leadership will ensure Apples network services meet demanding availability latency resilience and security requirements while continuously improving operational are looking for a technical expert who deeply understands cloud networking at scale is passionate about operating mission-critical globally distributed infrastructure preventing outages through proactive engineering and driving long-term reliability improvements through architectural excellence.n

Define and drive the long-term technical vision architecture and reliability strategy for large-scale cloud networking platforms spanning control plane and data plane and evolve fault-tolerant highly available network services ensuring graceful degradation and consistent performance under partial and systemic failure platform-wide resiliency patterns including service discovery health checking automated failover rate limiting circuit breaking and traffic management across multi-region and multi-cloud the design of network configuration management routing state distribution traffic engineering and capacity planning systems balancing scalability correctness and operational as a senior technical authority and architectural reviewer influencing critical design decisions across multiple teams and ensuring network failure modes are explicitly and champion automation-first reliability solutions including topology discovery deployment safety mechanisms self-healing systems and operational tooling that reduce toil and improve and own reliability metrics and observability standards (SLIs SLOs error budgets) using data to drive engineering trade-offs reliability investments and incident response impact through cross-team technical leadership embedding reliability early in design mentoring engineers and sharing deep technical knowledge through documentation and technical talks.

Extensive experience in software engineering systems engineering or infrastructure background in designing operating and supporting highly available fault-tolerant distributed systems at hyper systems programming skills including multi-threading concurrency caching batchingnSolid understanding of network infrastructure and software-defined networking (SDN).nAbility to lead cross-functional collaboration and influence technical decisions across teams.

Expert knowledge of API design and interface technologies (JSON ProtoBuf REST RPC XML etc)nIn depth knowledge of K8s OpenStack system virtualization build systems and infrastructure as codenStrong knowledge of observability systems (metrics logging tracing) and qualification knowledge of networking solutions across OSI layers 3 through written and verbal communication skills with the ability to clearly articulate risk reliability trade-offs and operational ability to manage competing priorities drive initiatives to completion and deliver results in fast-paced environments.

Required Experience:

Apply Now

About Company

Apple

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click