Staff Machine Learning Platform Engineer, AI Evaluation

Apple


Job Location:

Seattle, OR - USA

Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a staff machine learning platform engineer to lead the architectural design and development of the high availability services and internal tools powering self-service evaluation at scale. You will partner with researchers to operationalize their innovations transforming complex workflows into intuitive developer-first platforms. We are looking for builders who thrive in the ambiguity of new initiatives and are passionate about creating scalable infrastructure.

Were building the evaluation platform that will serve all of Apples generative AI and agent systems. This is early-stage work - some scrappy components exist much is greenfield and we need a staff engineer who can take it from here to org-wide self-service is not a maintain the infra role. Youll make consequential decisions about what to build what to integrate and what to say no to then ship it in Python with a small team.

Platform architecture u0026 delivery: Own the technical direction for our evaluation platform. Design and build the APIs SDKs and orchestration services that turn research-grade evaluation methodology into self-service building blocks other teams ship on top of. Youll start scrappy and intentionally with line of sight to the scaled ML research: Partner directly with research engineers to assess their code and determine what can be rewritten into clean Python services vs. what requires infrastructure changes (Ray GPU compute distributed scheduling). Build the reusable abstractions that make the next research handoff faster than the decision-making: You will balance complex competing priorities from partner engineering teams PMs and leadership. Your job is to distinguish signal from noise identifying the platform-level decisions that serve the org vs. one-off requests that dont scale. Youll advocate for these decisions clearly in documentation and in rooms with senior org-level evaluation strategy: Work with your technical manager to assess workload across engineers set priorities and define how self-service evaluation reaches every team at Apple. Youre a force multiplier not just through code but through the clarity of your technical experience: You own the experience end-to-end. Today that means supporting existing evaluation patterns (trace-based metric-based). Tomorrow it means enabling breakthrough approaches surfacing where models fail in non-obvious ways evaluating multi-turn agent trajectories and scoring complex tool-use rigor: Define the teams posture on testing CI/CD monitoring and reliability. You dont need to be an SRE but you ship with instrumentation and you set the standard others follow.

8 years of software engineering experience with a track record of owning platform-level technical direction.n0-to-1 builder who designs for scale. Youve taken something from nothing to production made deliberate tradeoffs about what to build now vs. later and can articulate depth : Youre not building the models but you can read research code and assess: is this a software problem or an infrastructure problem Do we need a rewrite or do we need GPUs You speak the language of research engineers evaluation experience that goes beyond traces. You understand the hard problems: non-deterministic outputs multi-step agent reasoning judge model reliability scoring drift. Youve built or operated systems that handle under ambiguity. You know when to build a rapid prototype for quick validation and when to be disciplined (design doc review test). You can tell the difference in real time not just in as a core skill. You write clearly design docs decision records platform roadmaps. You speak clearly in meetings with researchers in rooms with engineering leaders and balance the needs and priorities of partner teams and contribute to the sequencing of as primary language. Strong with FastAPI Pydantic and the ecosystem. Experience with job orchestration frameworks ( or similar). Bonus: Go or Rust for compute-hot ownership. Youve owned CI/CD containerization (Docker/K8s) and monitoring for production services. You dont just ship you keep things running.

Experience with distributed compute frameworks (Ray Dask)nBackground in startup or early-stage environments where you wore multiple hatsnFamiliarity with LLM token economics rate limiting and cost management at scale

Required Experience:

Staff IC

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a staff machine learning platform engineer to lead the architectural design and development of the high availability services and internal tools powering self-service evaluation at scale. You will p...

About Company

Company Logo

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile