Staff Software Engineer, Agentic Platform
Seattle, OR - USA
Department:
Job Summary
Docker has been one of the most loved brands in developer tooling trusted by more than 20 million monthly users and over 20 billion container image pulls. From solo founders to the worlds largest companies developers rely on Docker to build share and run their applications across our suite of products including Docker Desktop Docker Hub and Docker Scout.
We are a globally distributed remote-first team building the tools that define how software gets built and delivered. As AI agents redefine software development Docker is at the center of that shift providing the sandboxed environments verified images and secure infrastructure that make autonomous workflows trustworthy by default.
Join Dockers Agentic Platform team to build the foundational infrastructure powering the next generation of AI-driven workflows. Intelligent agents are rapidly becoming the primary interface between developers and complex systems and were building the platform that makes them reliable scalable and observable at production scale.
Youll be working on the core agent execution runtime orchestration primitives and the cloud infrastructure that keeps the Agentic Platform running 24/7. This is a high-ownership role: you wont just build systems youll run them respond when they fail and drive continuous improvement across the stack.
This is a greenfield opportunity to shape how agents are built and operated at scale. Youll work alongside seasoned engineers collaborating with partner teams across AI infrastructure developer experience and platform reliability.
Please note: for this role we are prioritizing candidates who currently live in Seattle WA Metro Area.
Responsibilities/What youll work on:
Agent Workflow & Orchestration
Design and operate the core agent execution runtime responsible for scheduling state management and lifecycle management of long-running agentic workflows
Build robust multi-agent coordination patterns: task handoff agent memory (short-term and long-term) tool use and workflow branching at scale
Develop context window management strategies and session persistence layers for stateful agent interactions
Build tooling for prompt engineering as a first-class engineering discipline versioning testing and evaluation of prompts at scale
Build platform capabilities that support developers working in AI-assisted coding workflows including IDE integrations local-first development environments and fast iteration loops
Cloud Infrastructure & Service Ownership
Own and operate Agentic Platform services in AWS or OCI infrastructure provisioning scaling cost management and reliability
Provision and manage cloud infrastructure using Terraform; manage Kubernetes application packaging and deployment with Helm
Participate in the 24/7 on-call rotation
This role may require participation in a 24/7 on-call rotation for the Agentic Platform; carry genuine pager responsibility for the services you build and operate
Define and uphold SLOs; lead incident response blameless post-mortems and drive continuous reliability improvements
Instrument systems for observability: distributed tracing structured logging metrics dashboards and alerting
Technical Leadership
As a Staff Engineer partner with engineering leadership to set technical direction and serve as a guide and mentor as the team grows
Drive architectural decisions that balance velocity with long-term maintainability across a distributed cloud-native stack
Collaborate cross-functionally with product managers designers and partner engineering teams to integrate agentic capabilities into the broader developer platform
Contribute to a culture of engineering excellence through design reviews RFC processes and mentorship
Qualifications for this role
Required:
12 years of professional hands-on full-time software engineering experience in backend infrastructure or platform engineering.
Cloud Platform Expertise (AWS/OCI/Azure/GCP): Proven hands-on experience operating production services in AWS or Oracle Cloud Infrastructure compute networking managed services IAM and cost management. This is a must-have; the Agentic Platform is a cloud-native service running 24/7.
Service Ownership in a Cloud Setting: You have owned production services end-to-end on-call incident response SLO definition and post-mortems. You dont just build; you run what you build.
Distributed Systems Design: Deep understanding of fault tolerance consistency observability and scalability in cloud-native environments
Backend Engineering Proficiency: Strong proficiency in at least one backend language used for systems work Go Python Rust or Java
Bachelors degree in Computer Science Engineering or a related field or equivalent practical experience
Strongly Preferred:
Go: Professional proficiency in Go Dockers primary language for backend systems
Infrastructure as Code: Experience with Terraform for cloud infrastructure provisioning and Helm for Kubernetes application packaging and deployment
Data Infrastructure: Experience with PostgreSQL and Redis / Pub-Sub patterns for state management caching and event-driven agent workflows
MCP & Agent Tooling: Experience with MCP (Model Context Protocol) server design and integration
Container & Orchestration: Docker Kubernetes or equivalent especially in the context of agent sandboxing and secure code execution environments
AI-assisted development tools: Familiarity with Cursor Claude Code Copilot Windsurf etc. and the developer personas using them
Agent Evaluation: Experience with LLM-as-judge frameworks behavioral regression testing and golden dataset management
Agent Systems Experience: Hands-on experience building or operating AI agent systems including multi-agent orchestration tool use memory systems or agent evaluation frameworks
Open Source: Contributions or community engagement on relevant open source projects
Docker considers visa sponsorship on a case-by-case basis based on business needs.
Perks
Freedom & flexibility; fit your work around your life
Designated quarterly Whaleness Days plus end of year Whaleness break
Home office setup; we want you comfortable while you work
16 weeks of paid Parental leave (after 6 months of employment)
Technology stipend equivalent to $100 USD net/month
PTO plan that encourages you to take time to do the things you enjoy
Training stipend for conferences courses and classes
Equity; we are a growing start-up and want all employees to have a share in the success of the company
Docker Swag
Medical benefits retirement and holidays vary by country
Remote-first culture with offices in Seattle and Paris
Docker embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds perspectives and skills. The more inclusive we are the better our company will be.
#LI-REMOTE
Required Experience:
Staff IC
About Company
Docker is a platform designed to help developers build, share, and run container applications. We handle the tedious setup, so you can focus on the code.