Senior DevOps Engineer (AI)

Bilue


Job Location:

Sydney - Australia

Monthly Salary: Not Disclosed
Posted on: 28 days ago
Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

Were looking for a Senior DevOps Engineer to join our Applied AI practice and work at the intersection of platform engineering and AI delivery. This is a hands-on role where youll lead the optimisation and evolution of our cloud infrastructure deployment pipelines and operational practices to ensure we consistently deliver high-quality outcomes for our clients.

This isnt a standard DevOps role. Youll be building and operating the infrastructure that production AI systems actually run on agentic pipelines LLM integrations retrieval systems in enterprise environments across financial services government insurance and retail. That means bringing the same rigour youd apply to any critical system and then going further: LLMOps inference cost engineering evaluation harnesses and resilience patterns purpose-built for non-deterministic APIs.

Youll work closely with AI Engineers delivery teams and client stakeholders to uplift platform capability improve delivery velocity and embed quality through automation observability and strong engineering standards.

Core DevOps

  • Architect build and continuously enhance CI/CD pipelines to automate and accelerate software delivery across the team.

  • Lead the management and optimisation of cloud infrastructure (AWS) ensuring scalability security and reliability while championing best practices.

  • Design implement and maintain Infrastructure as Code (IaC) with tools such as Terraform and CloudFormation enabling the team to deploy with confidence and agility.

  • Proactively monitor troubleshoot and enhance system performance availability and security ensuring operational excellence across client environments.

  • Drive the adoption of containerisation and orchestration technologies like Docker and Kubernetes to enable scalable high-performance solutions.

  • Improve system observability by implementing advanced logging monitoring and alerting with tools such as Prometheus Grafana Datadog CloudWatch and the ELK stack.

  • Lead the implementation of security best practices including IAM secrets management and vulnerability assessments.

  • Collaborate closely with developers to continuously optimise build deployment and scaling strategies for seamless integration and continuous delivery.

  • Automate key operational tasks and apply SRE principles to enhance system reliability uptime and overall performance.

  • Take ownership of incident response and lead root cause analysis for production issues ensuring swift resolution and ongoing improvement.

 

AI-Specific Responsibilities

  • Practise LLMOps: implement prompt versioning model evaluation pipelines and controlled promotion gates before anything reaches production.

  • Instrument beyond standard metrics: design observability for token costs inference latency retrieval quality and model drift detection.

  • Build agentic resilience: implement rate limiting circuit breakers and graceful fallbacks for non-deterministic LLM APIs.

  • Own inference cost engineering: design throughput management caching strategy and cost-per-query alerting to keep AI systems economically viable at scale.

  • Design AI-native CI/CD pipelines with evaluation harnesses and golden dataset regression tests baked in before any model or prompt change reaches production.


Qualifications :

  • 5 years of hands-on experience in DevOps SRE or Cloud Engineering.

  • Extensive expertise in AWS cloud platforms and services.

  • Practical experience with Kubernetes and containerisation technologies.

  • Strong scripting and automation skills with Bash Python or Go.

  • In-depth knowledge of CI/CD tools including Jenkins GitHub Actions GitLab CI/CD and ArgoCD.

  • Solid experience with Infrastructure as Code tools including Terraform and CloudFormation.

  • Comprehensive understanding of Linux administration and networking fundamentals.

  • Experience implementing security best practices including IAM SSL/TLS and compliance frameworks such as SOC2 ISO 27001 and GDPR.

  • Proficiency in monitoring and logging tools including the ELK Stack Prometheus Grafana or Datadog.

  • Exceptional problem-solving skills and the ability to operate in a fast-moving ambiguous environment.

  • Strong communication and collaboration skills to work effectively across cross-functional teams including client stakeholders.

Nice to Have

  • Familiarity with serverless architectures such as AWS Lambda.

  • Experience with database performance tuning and scaling techniques.

  • Relevant certifications in AWS Azure or GCP DevOps.

  • Prior experience supporting AI or ML workloads in production environments.

  • Familiarity with LLM observability tooling such as LangSmith Weave or similar.


Additional Information :

Life at Bilue

People-first focus: Were committed to delivering exceptional outcomes for our clients but we know it starts with our people. Youll join a values-led team thats collaborative curious and genuinely cares about doing great work together.

Connection that counts: From monthly anchor days and team lunches to our annual offsite we create intentional moments to connect collaborate and celebrate. These arent just fun perks theyre part of how we work and grow together.

Flexibility that works: We offer hybrid working with minimum 3 days per week in the office. Its a balance that gives you the space to do your best work while still creating time to connect and build strong relationships in person.

Strong internal communities: We actively foster internal communities across tech design delivery and beyond giving you plenty of chances to connect share knowledge and learn from your peers.

Opportunities to grow: We invest in your development with unlimited access to Go1s learning library and support from our internal performance coach. Whether you want to deepen your technical skills or grow your leadership potential well back you.

Flat structure real impact: At Bilue everyones voice matters. Our leadership team is hands-on and approachable and we operate without unnecessary layers. We keep things open and transparent and your ideas will be heard no matter your title.

Bilue Big Blue Ocean. Are you ready to set sail Apply now!

NB. This is a full-time position based in Sydney. To be considered candidates must have unrestricted working rights in Australia.


Remote Work :

No


Employment Type :

Full-time

Were looking for a Senior DevOps Engineer to join our Applied AI practice and work at the intersection of platform engineering and AI delivery. This is a hands-on role where youll lead the optimisation and evolution of our cloud infrastructure deployment pipelines and operational practices to ensure...

About Company

Hello, we’re Bilue - a leading design and development agency specialising in mobile, cloud, web, and emerging technologies. Bilue was founded by Cameron Barrie, an app developer with a vision to help Australian companies design and deliver cutting-edge digital experiences. What began ... View more

View Profile View Profile