Devops Engineer
Posted on:
21 days ago
Vacancies:
1 Vacancy
Job Summary
DevOps Engineer EMEIA Infrastructure
ROLE OVERVIEW
We are looking for a skilled and pragmatic DevOps Engineer to own and evolve our infrastructure across the EMEIA region. This is a dual-horizon role: you will keep our existing VM-based systems healthy while leading a greenfield effort to design and build the managed environment that those solutions will migrate onto.
A significant proportion of what we build is produced rapidly using AI-assisted structured development. That means our solutions can move from idea to deployment faster than ever and our infrastructure needs to keep pace. We need someone who thrives in a fast-moving ambiguous environment can absorb change quickly and treats adaptability as a core part of the job rather than an occasional demand.
The new managed environment is most likely to be based on Kube Apples internal Kubernetes (EKS) deployment though the final architecture will be a team decision and remains an option for workloads requiring greater control. You will help inform that decision and then own the build-out regardless of which direction is chosen.
You will work closely with data engineers developers and analysts acting as the infrastructure backbone for a team that moves quickly and expects you to move with it. The role also involves working directly with third-party vendors who support some of the tools being deployed and collaborating with teams outside of EMEIA including WorldWide to align on standards share solutions and resolve cross-regional dependencies.
KEY RESPONSIBILITIES
Platform Migration & Environment Design
- Lead the design and build-out of a new managed container environment to replace existing VM-based infrastructure the most likely candidate is Kube (Apples internal Kubernetes/EKS cluster) but the final decision will be made collaboratively as a team
- Contribute meaningfully to the environment selection decision: weigh trade-offs between managed solutions (Kube) and more directly controlled alternatives () considering maintenance overhead operational control and team capability
- Own the migration of existing VM-based workloads onto the new platform managing sequencing risk and continuity of service throughout
- Establish and maintain the standard workflow for deploying solutions: build locally containerise publish to Kube configure connectivity to Apple internal system dependencies
Apple Internal Networking & Connectivity
- Configure and maintain networking between Kube and Apples internal systems including Shield Snowflake Appleconnect Floodgate and any other platform dependencies the team relies on
- Own namespace and compute provisioning on the shared Kube cluster ensuring workloads are appropriately isolated and correctly configured
- Manage credentials service accounts and access controls across the full connectivity chain from container to downstream service
- Act as the go-to expert on how things connect within Apples internal network topology
Infrastructure Management
- Own and manage cloud infrastructure across EMEIA using internal cloud tooling ( and connected systems including Shield)
- Manage certificates firewalls resource pools networking and access controls
- Ensure infrastructure is appropriately sized resilient and cost-efficient
- Maintain accurate documentation of infrastructure topology and configuration
VM Provisioning & Automation (Existing Estate)
- Maintain and operate existing virtual machines primarily on RHEL while migration to the new environment is in progress
- Build and maintain standardised repeatable provisioning processes (e.g. via Ansible Terraform or equivalent IaC tooling)
- Manage package deployment software repositories databases and web servers
- Own the patching and update lifecycle for managed systems
Monitoring & Reliability
- Implement and maintain monitoring alerting and observability across both the existing VM estate and the new container environment
- Proactively identify risks bottlenecks and failure patterns before they impact users
- Define and track appropriate SLIs/SLOs for critical services
- Conduct post-incident reviews and drive lasting improvements
Supporting AI-Augmented Development
- A large proportion of the solutions you will support are built rapidly using structured AI-assisted development you must be comfortable working with codebases and configurations that evolve quickly may not have deep documentation histories and may have been substantially generated with AI tooling
- Provide the infrastructure scaffold that allows AI-assisted solutions to move from local development to production reliably and safely
- Be a pragmatic partner to developers: unblock deployment quickly catch infrastructure-level risks early and help establish patterns that make rapid iteration safe at scale
- Actively use AI tools (e.g. Claude Copilot or similar) to accelerate your own work: writing scripts diagnosing issues generating runbooks reviewing configurations
Diagnosis & Incident Response
- Take ownership of vague or ambiguous production issues (e.g. its running slow the server keeps falling over) and drive them through to resolution
- Deliver short-term fixes rapidly to restore service while tracking and delivering long-term root cause resolutions
- Maintain a pragmatic balance between speed-of-recovery and quality-of-fix
SKILLS & EXPERIENCE
Essential
- Proven experience in a DevOps infrastructure or platform engineering role
- Hands-on experience with Kubernetes deploying configuring and operating workloads in a shared or managed cluster environment
- Experience containerising applications: writing Dockerfiles managing images publishing to a registry and debugging container-level issues
- Strong networking fundamentals: DNS TLS/SSL certificates firewall rules load balancing VPNs and service-to-service connectivity
- Comfort operating in environments where the architecture is still being defined able to contribute to the decision then execute once direction is set
- Hands-on experience with RHEL (or equivalent enterprise Linux) provisioning hardening package management (yum/dnf) systemd services
- Experience managing cloud infrastructure ideally in an enterprise private/hybrid cloud environment
- Experience with infrastructure-as-code or configuration management tooling (e.g. Terraform Ansible Puppet or similar)
- Solid scripting ability in Bash and at least one higher-level language (Python preferred)
- Experience with monitoring and observability tooling (e.g. Prometheus Grafana Datadog or similar)
- Strong incident diagnosis skills able to work from vague symptoms to root cause using logs metrics and reasoning
- Comfortable working with AI-generated or AI-assisted codebases: reading extending and debugging solutions without a full traditional authorship history
- Clear written and verbal communication able to translate infrastructure complexity for non-technical stakeholders
Desirable
- Experience with AWS or particularly EKS
- Familiarity with Apples internal platform tooling: Kube Shield Appleconnect Floodgate or similar
- Experience integrating with Snowflake including managing drivers credentials and network access
- Experience with CI/CD pipelines (GitLab CI Jenkins GitHub Actions or similar)
- Exposure to security tooling vulnerability scanning or compliance frameworks (e.g. CIS Benchmarks)
- Familiarity with secrets management tooling (Vault CyberArk or similar)
- Experience working in a regulated or enterprise environment with change management processes
WAYS OF WORKING
- You are comfortable with genuine ambiguity including at the architectural level and can make progress and contribute to decisions without waiting for everything to be resolved
- You default to automation: if you do something twice you script it; if you do it three times you build a process
- You adapt quickly: the tools environments and solutions you support can change fast and you treat that as normal rather than exceptional
- You are pragmatic under pressure: you know when to stop the bleeding first and fix it properly later
- You are self-directed and comfortable owning problems end-to-end with minimal hand-holding
- You are a willing partner to developers who move fast you keep up add guardrails where they matter and dont become a bottleneck
WHAT SUCCESS LOOKS LIKE
- A new managed container environment is designed built and running with existing VM-based workloads migrated onto it in a controlled sequenced way
- The standard deployment path (build containerise publish connect) is well-established documented and easy for the team to use
- Connectivity from the new environment to Apple internal systems (Snowflake Appleconnect Shield Floodgate etc.) is reliable well-understood and correctly secured
- Teams are unblocked quickly when they need new integrations access or capabilities even when the solutions they are deploying have been built at speed
- Production issues are resolved rapidly with lasting fixes following close behind
- Monitoring catches issues before users do
- The infrastructure estate both old and new is well-documented well-understood and in a known-good state
-
Required Skills:
infrastructure