Evaluation & Insights Machine Learning Engineer

Job Location:

Seattle, OR - USA

Monthly Salary: Not Disclosed

Posted on: 2 days ago

Vacancies: 1 Vacancy

Job Summary

Imagine what you could do here. At Apple great new ideas have a way of becoming extraordinary products services and customer experiences very quickly. Bring passion and dedication to your job and theres no telling what you could accomplish! Are you passionate about music movies and the world of Artificial Intelligence and Machine Learning So are we! Join our Human-Centered AI team for Apple this role youll represent the user perspective on new features review and analyze data and evaluate AI models powering everything from search and recommendations to other innovative features. Collaborate with Data Scientists Researchers and Engineers to drive improvements across our platforms.

We are looking for an Evaluation u0026 Insights Engineer for the Human-Centered AI team to help evaluate and improve AI systems by combining data science model behavior analysis and qualitative this role you will analyze AI outputs develop evaluation frameworks design qualitative and translate findings into actionable improvements for product and engineering teams. This role blends deep technical expertise with strong analytical judgment to assess interpret and improve the behavior of advanced AI models. You will work cross-functionally with the Engineering and Project Managers Product and Research teams to ensure that AI experience is reliable safe and aligned with human expectations.

Lead Rigorous Model Evaluations: Architect and execute comprehensive evaluation suites for LLMs and multimodal models identifying edge cases in multi-step reasoning factuality adversarial robustness safety and Scoring Frameworks: Develop deterministic heuristic and LLM-assisted evaluation frameworks (e.g. LLM-as-a-judge reward modeling) to quantify human-perceived quality metrics (e.g. helpfulness hallucination rates).nActionable Signal Extraction: Translate qualitative failure modes into quantifiable loss patterns programmatic guardrails and actionable data-mixture adjustments for model training and Performance: Partner with engineering teams to refine model behavior leveraging evaluation telemetry to inform prompt engineering Retrieval-Augmented Generation (RAG) strategies and model Pattern Recognition: Apply advanced ML techniques (e.g. embedding-based clustering representation learning perturbation analysis) to systematically map error taxonomies and latent failure manifolds in model u0026 Automation: Develop robust MLOps workflows to codify evaluation metrics automate regression testing across model checkpoints and integrate human-centric assessments into ML CI/CD Evaluation Pipelines: Architect scalable distributed inference and processing pipelines (e.g. Ray vLLM) for high-throughput model evaluation automated annotation and output analysis at -Centric Metrics: Define quantitative evaluation frameworks that capture nuanced human factors including trust calibration conversational state tracking and -Evaluator Systems: Build automated evaluation pipelines utilizing LLMs to assess outputs at scale optimizing for high correlation with human baseline -Functional Partnership: Collaborate with ML researchers software developers and product managers across Apple to translate product requirements into scalable reliable and efficient model evaluation infrastructure.n

Bachelors or Masters degree in Computer Science Machine Learning Artificial Intelligence Cognitive Science or a related technical field with 5 years of relevant industry experience in ML Engineering or Applied proficiency in Python and modern deep learning ecosystems (PyTorch JAX Hugging Face).nProven experience building scalable ML inference pipelines model-evaluation workflows and structured rating frameworks for large-scale AI ability to interpret unstructured model outputs (text transcripts embedding spaces) and synthesize qualitative findings into actionable engineering guidance and training -on experience developing fine-tuning or evaluating LLMs multimodal models and NLP familiarity with AI quality metrics hallucination detection techniques (e.g. SelfCheckGPT) model alignment (RLHF/DPO) and LLM-as-a-judge frameworks (e.g. G-Eval DeepEval).nExperience building internal tools or automated pipelines for ML workflows using tools like MLflow Weights u0026 Biases or similar familiarity with advanced prompt engineering RAG architectures (vector databases semantic search) and Fine-Tuning .

Knowledge of human factors HCI or cognitive science methodologies as applied to AI system design.

Required Experience:

Apply Now

About Company

Apple

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click