CVML Platform Engineer
Austin, TX - USA
Job Summary
Company Overview
Allen Control Systems (ACS) is acutting-edgedefense startup founded by two former Navy electrical engineers with a proventrack recordin robotics and software. We are developing a small autonomous gun turret that employs advanced computer vision and control systems to precisely target and neutralize small drones and loitering munitions. Our innovative approach requires overcoming significant technical challenges making this an exciting and dynamic environment for experienced engineers.
With an engineering-first culture ACS values technical excellence and innovation. Backed by our founders successful exits from twopreviousventuresacquiredfor a combined $180M in 2022 we are committed to ensuring that the groundbreaking technologies we develop have a real-world impact.
Position Overview
We are seeking an experienced CV/ML Platform Engineer with specialization in Computer Vision and Machine Learning (CV/ML) to design build and own the data model and compute infrastructure powering ACS CV/ML team. You will help manage a 130 GPU bare-metal Kubernetes cluster own CV/ML CI/CD pipelines and ensure ML model training proceeds at high volume with low friction.
What Youll Do:
- Deploy andoperateKubernetes clusters on bare-metal infrastructure hosting 130 NVIDIA GPUs with hybrid burst capability to AWS for scalable compute and storage workloads.
- Manage NVIDIA GPU clusters for ML training.
- Own the ACS CV/ML CI/CD pipeline.
- Improve and maintain core ML infrastructure such as model registration and versioning experiment tracking and model and data provenance tracking.
- Improve and maintain ML model testing performance analysis and reporting tools.
- Automate repetitive model training and testing tasks to increase developer velocity.
- Work with Software Team Platform Engineers to ensure efficient coordination and minimal duplication between CV/ML infrastructure and wider Software infrastructure.
- Collaborate with the Software Team to automate the optimization of models (TensorRT/quantization) for deployment on NVIDIA Jetson and other edge hardware.
Required Technical Skills:
- 2 years of experience in Platform Engineering or DevOps/MLOps.
- Strong programming skills are required for automating ML lifecycles and building custom CLI tools for CVengineers.
- Hands-on experience with NVIDIA GPU infrastructure including managing CUDA libraries and developmentenvironments GPU Operator device plugins and scheduling (MIG Volcano or fractional GPU sharing).
- Experience implementing and maintainingMLOpsplatforms such as KubeflowMLflow Weights & Biases (W&B) or DVC for experiment tracking and modelversioning.
- Familiarity with high-performance storage solutions ( WEKA or Ceph) and data orchestration tools capable of handling terabytes of video/image data.
- Proventrack recordbuilding CI/CD pipelines that include automated model validation performance benchmarking and artifact management for both cloud and edge targets.
- Experience with model optimization toolchains includingTensorRT ONNX and quantization techniques specifically for cross-compilation to ARM targets like NVIDIAJetson.
- Proficiencywith observability stacks (ELK Prometheus/Grafana) adapted for ML including monitoring GPU health training throughput and model inferencemetrics.
- Strong Linux systems knowledge (Debian/Ubuntu) including networkingfor high-throughput datastorageand security hardening fordefense-grade productionenvironments.
What We Offer
- Competitive salary
- Health Dental Vision Insurance
- Paid Time Off
Allen Control Systems is an Equal Opportunity Employerprovidingequal employment opportunities to all employees and applicants for employment. Allen Control Systems prohibits discrimination and harassment of any type without regard to race color religion age sex national origin disability status genetics protected veteran status sexual orientation gender identity or expression or any other characteristic protected by federalstateor local laws.
#LI-AS1
Required Experience:
IC
About Company
Allen Control Systems is a robotics defense company purpose-built to deliver advanced robotic capabilities to fill modernization demands across the defense industry and national security communities.