Mgr, Engineering Program Management, AI Platforms & Infrastructure

Job Location:

Santa Clara County, CA - USA

Monthly Salary: Not Disclosed

Posted on: Yesterday

Vacancies: 1 Vacancy

Job Summary

Imagine what you could do here. At Apple new ideas have a way of becoming extraordinary products services and customer experiences very quickly. Do you love taking on challenges that create a positive impact Are you passionate about empowering many ground-breaking intelligent experiences to be made The Apple Services Engineering org is building groundbreaking technology... and we are looking for people like you! Apple offers a collaborative work environment that fosters creativity and innovation. Every new product service or feature we invent is the result of people working together to make each others ideas stronger. That happens here because every one of us strives toward a common goal - crafting the best customer experiences.

We are looking for an experienced Engineering Program Manager (EPM) Manager to lead strategy execution and delivery across our AI/ML platform and infrastructure this role you will drive cross-functional initiatives spanning Apples massive-scale GPU/TPU compute infrastructure Foundation Model inference platforms and hybrid-cloud AI systems. You will partner closely with engineering and operations leaders to translate complex technical requirements into actionable roadmaps. Crucially you will be responsible for growing and scaling a high-performing EPM team to meet the rapidly expanding demands of Apples generative AI and machine learning platforms.

Build scale and mentor a high-performing team of Engineering Program Managers fostering a culture of ownership accountability and execution rigor during a period of significant organizational strategy roadmap planning and end-to-end execution for large-scale AI/ML infrastructure programs heavily focused on Foundation Model inference and training cross-functional alignment across engineering product and operations teams to deliver scalable low-latency compute infrastructure utilizing massive GPU and TPU as the strategic engineering interface with tier-1 third-party cloud vendorsnegotiating upfront technical constraints capacity plans and SLAswhile partnering seamlessly with Operations teams to ensure vendor capabilities meet our Foundation Model cost efficiency and operational excellence programs through smarter resource allocation compute capacity forecasting and global workload with partner Operations teams to align engineering roadmaps with infrastructure execution covering capacity forecasting performance tuning and disaster recovery in multi-region hybrid cloud qualification and rollout plans for new infrastructure build-outs ensuring reliability and performance benchmarks are met before production with engineering leadership to translate product requirements into long-term infrastructure strategies optimizing for efficiency and global scale.

10 years of experience in product or program management with at least 3 years in a people management or lead EPM experience building and scaling teams with the organizational savvy to expand team scope and influence across a highly matrixed experience managing strategic relationships with top-tier cloud vendors and external partners including infrastructure planning contract alignment and SLA strategic thinking with the ability to balance long-term platform roadmap priorities against near-term inference and training execution record of delivering massive-scale cost optimization and operational efficiency programs in hybrid-cloud communication and stakeholder management skills able to translate complex technical infrastructure concepts for both deep engineering teams and executive in multi-tenant high-performance compute environments running large-scale Foundation Models or similar ML in EE/CS/CE or equivalent

Deep technical background in AI/ML infrastructure cloud operations or distributed compute platforms with direct experience in GPU/TPU capacity management and with large-scale distributed training frameworks (e.g. PyTorch Megatron-LM JAX) and their infrastructure implications at with FinOps practices in large-scale GPU/TPU navigating large-scale organizational change and team restructuring.

Required Experience:

Manager

Required Experience:

Manager

Apply Now

About Company

Apple

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click