Imagine what you could do here. At Apple great new ideas have a way of becoming extraordinary products services and customer experiences very quickly. Bring passion and dedication to your job and theres no telling what you could accomplish! Are you passionate about music movies and the world of Artificial Intelligence and Machine Learning So are we! Join our Human-Centered AI team for Apple Media this role youll represent the user perspective on new features review and analyze data and evaluate AI models powering everything from search and recommendations to other innovative features. Collaborate with Data Scientists Researchers and Engineers to drive improvements across our platforms.n
We are looking for a Machine Learning Engineer focused on Evaluation u0026 Insights for the Human-Centered AI this role you will bridge the gap between human perception and algorithmic performance helping evaluate and optimize Foundation Models and generative AI systems. You will architect robust evaluation frameworks design scalable MLOps pipelines for model assessment and translate qualitative failure modes into programmatic guardrails and training signals (e.g. SFT RLHF/DPO). This role blends deep ML engineering expertise with strong analytical judgment to assess interpret and improve the behavior of advanced AI models. You will work cross-functionally with Software Engineering Product Research and Responsible AI teams at Apple to ensure that our AI experiences are reliable safe and aligned with human expectations.
Lead Rigorous Model Evaluations: Architect and execute comprehensive evaluation suites for LLMs and multimodal models identifying edge cases in multi-step reasoning factuality adversarial robustness safety and Scoring Frameworks: Develop deterministic heuristic and LLM-assisted evaluation frameworks (e.g. LLM-as-a-judge reward modeling) to quantify human-perceived quality metrics (e.g. helpfulness hallucination rates).nActionable Signal Extraction: Translate qualitative failure modes into quantifiable loss patterns programmatic guardrails and actionable data-mixture adjustments for model training and Performance: Partner with engineering teams to refine model behavior leveraging evaluation telemetry to inform prompt engineering Retrieval-Augmented Generation (RAG) strategies and model Pattern Recognition: Apply advanced ML techniques (e.g. embedding-based clustering representation learning perturbation analysis) to systematically map error taxonomies and latent failure manifolds in model u0026 Automation: Develop robust MLOps workflows to codify evaluation metrics automate regression testing across model checkpoints and integrate human-centric assessments into ML CI/CD Evaluation Pipelines: Architect scalable distributed inference and processing pipelines (e.g. Ray vLLM) for high-throughput model evaluation automated annotation and output analysis at -Centric Metrics: Define quantitative evaluation frameworks that capture nuanced human factors including trust calibration conversational state tracking and -Evaluator Systems: Build automated evaluation pipelines utilizing LLMs to assess outputs at scale optimizing for high correlation with human baseline -Functional Partnership: Collaborate with ML researchers software developers and product managers across Apple to translate product requirements into scalable reliable and efficient model evaluation infrastructure.
Bachelors or Masters degree in Computer Science Machine Learning Artificial Intelligence Cognitive Science or a related technical field with relevant industry experience in ML Engineering or Applied proficiency in Python and modern deep learning ecosystems (PyTorch JAX Hugging Face).nStrong ability to interpret unstructured model outputs (text transcripts embedding spaces) and synthesize qualitative findings into actionable engineering guidance and training -on experience developing fine-tuning or evaluating LLMs multimodal models and NLP familiarity with AI quality metrics hallucination detection techniques (e.g. SelfCheckGPT) model alignment (RLHF/DPO) and LLM-as-a-judge frameworks (e.g. G-Eval DeepEval).n
Knowledge of human factors HCI or cognitive science methodologies as applied to AI system experience building scalable ML inference pipelines model-evaluation workflows and structured rating frameworks for large-scale AI building internal tools or automated pipelines for ML workflows using tools like MLflow Weights u0026 Biases or similar familiarity with advanced prompt engineering RAG architectures (vector databases semantic search) and Fine-Tuning .n
Required Experience:
IC
Imagine what you could do here. At Apple great new ideas have a way of becoming extraordinary products services and customer experiences very quickly. Bring passion and dedication to your job and theres no telling what you could accomplish! Are you passionate about music movies and the world of Arti...
Imagine what you could do here. At Apple great new ideas have a way of becoming extraordinary products services and customer experiences very quickly. Bring passion and dedication to your job and theres no telling what you could accomplish! Are you passionate about music movies and the world of Artificial Intelligence and Machine Learning So are we! Join our Human-Centered AI team for Apple Media this role youll represent the user perspective on new features review and analyze data and evaluate AI models powering everything from search and recommendations to other innovative features. Collaborate with Data Scientists Researchers and Engineers to drive improvements across our platforms.n
We are looking for a Machine Learning Engineer focused on Evaluation u0026 Insights for the Human-Centered AI this role you will bridge the gap between human perception and algorithmic performance helping evaluate and optimize Foundation Models and generative AI systems. You will architect robust evaluation frameworks design scalable MLOps pipelines for model assessment and translate qualitative failure modes into programmatic guardrails and training signals (e.g. SFT RLHF/DPO). This role blends deep ML engineering expertise with strong analytical judgment to assess interpret and improve the behavior of advanced AI models. You will work cross-functionally with Software Engineering Product Research and Responsible AI teams at Apple to ensure that our AI experiences are reliable safe and aligned with human expectations.
Lead Rigorous Model Evaluations: Architect and execute comprehensive evaluation suites for LLMs and multimodal models identifying edge cases in multi-step reasoning factuality adversarial robustness safety and Scoring Frameworks: Develop deterministic heuristic and LLM-assisted evaluation frameworks (e.g. LLM-as-a-judge reward modeling) to quantify human-perceived quality metrics (e.g. helpfulness hallucination rates).nActionable Signal Extraction: Translate qualitative failure modes into quantifiable loss patterns programmatic guardrails and actionable data-mixture adjustments for model training and Performance: Partner with engineering teams to refine model behavior leveraging evaluation telemetry to inform prompt engineering Retrieval-Augmented Generation (RAG) strategies and model Pattern Recognition: Apply advanced ML techniques (e.g. embedding-based clustering representation learning perturbation analysis) to systematically map error taxonomies and latent failure manifolds in model u0026 Automation: Develop robust MLOps workflows to codify evaluation metrics automate regression testing across model checkpoints and integrate human-centric assessments into ML CI/CD Evaluation Pipelines: Architect scalable distributed inference and processing pipelines (e.g. Ray vLLM) for high-throughput model evaluation automated annotation and output analysis at -Centric Metrics: Define quantitative evaluation frameworks that capture nuanced human factors including trust calibration conversational state tracking and -Evaluator Systems: Build automated evaluation pipelines utilizing LLMs to assess outputs at scale optimizing for high correlation with human baseline -Functional Partnership: Collaborate with ML researchers software developers and product managers across Apple to translate product requirements into scalable reliable and efficient model evaluation infrastructure.
Bachelors or Masters degree in Computer Science Machine Learning Artificial Intelligence Cognitive Science or a related technical field with relevant industry experience in ML Engineering or Applied proficiency in Python and modern deep learning ecosystems (PyTorch JAX Hugging Face).nStrong ability to interpret unstructured model outputs (text transcripts embedding spaces) and synthesize qualitative findings into actionable engineering guidance and training -on experience developing fine-tuning or evaluating LLMs multimodal models and NLP familiarity with AI quality metrics hallucination detection techniques (e.g. SelfCheckGPT) model alignment (RLHF/DPO) and LLM-as-a-judge frameworks (e.g. G-Eval DeepEval).n
Knowledge of human factors HCI or cognitive science methodologies as applied to AI system experience building scalable ML inference pipelines model-evaluation workflows and structured rating frameworks for large-scale AI building internal tools or automated pipelines for ML workflows using tools like MLflow Weights u0026 Biases or similar familiarity with advanced prompt engineering RAG architectures (vector databases semantic search) and Fine-Tuning .n
Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar
... View more