Staff Software Engineer Multi Cloud Efficiency
Job Summary
P-1563
At Databricks we are passionate about enabling data teams to solve the worlds toughest problems from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the worlds best data and AI infrastructure platform so our customers can use deep data insights to improve their business. Founded by engineers and customer obsessed we leap at every opportunity to solve technical challenges from designing next-gen UI/UX for interfacing with data to scaling our services and infrastructure across millions of virtual machines. And were only getting started inBengaluru India ! As a software engineer with a backend focus you will work with your team to build infrastructure for the Databricks platform at scale.
About the Team
We manage the infrastructure efficiency for one of the worlds largest data and AI platforms. Operating at massive scale across AWS Azure and GCP our mission is to ensure that every dollar spent on cloud infrastructure is optimized. The MCE team builds the critical tools frameworks and automated systems that provide deep cost visibility and ensure maximum resource utilization. We arent just cutting costs; we are engineering systems to scale efficiently as our cloud footprint expands into the hundreds of millions of dollars.
The Role
As a Staff Software Engineer you will serve as the technical lead for the Bengaluru MCE chapter. You will be responsible for defining the technical strategy for efficiency projects where the solutions are not yet known. You will lead cross-functional efforts working with finance and cost team to tackle massive-scale engineering challenges ensuring our systems can automatically drive cost efficiency and handle the complexity of global multi-cloud data processing without compromising on performance or stability. As one of the founding senior engineers of the Bengaluru MCE chapter a team that has already delivered millions of dollars in verified annual cloud savings within two quarters of formation you will shape its technical direction and help scale it toward amplifying global impact.
The impact you will have
- Architect at Scale: Lead the design and implementation of systems to optimize cloud spend driving impact in the magnitude of hundreds of millions of dollars. You will architect solutions for complex problems like reserved-instance automation tag-based cost attribution at scale automated waste-cleanup policy engines and cross-cloud resource governance.
- Drive Technical Strategy: Identify and deliver high-impact well-adopted projects that solve complex efficiency problems. You will define the roadmap for our efficiency pillar conducting thorough design reviews and risk assessments for systems supporting the companys core infrastructure.
- Engineer for Efficiency: Design scalable systems for regression detection and automated resource rightsizing. You will build the frameworks that allow product teams to measure their efficiency footprint the cost and resource impact of their services and hold them accountable through automated release gates that block changes that regress agreed efficiency targets (SLOs).
- Operational Excellence: Serve as a technical authority for incident management addressing complex system-wide performance issues. You will improve the stability and reliability of our internal platform tools ensuring they are robust enough to manage our growing global scale.
- Technical Leadership: Act as a force multiplier by mentoring junior and senior engineers raising the bar for engineering quality and driving engineering culture across the organization.
- Build self-driving efficiency systems: Design the automation and agentic layers AI-driven detection decisioning and autonomous remediation that let cost governance run continuously at scale shifting the team from manual cleanup to systems that keep themselves efficient.
What We Look For
- 9 years of experience building production-grade distributed systems in Java Scala C or Go.
- Proven track record of architecting solutions for large-scale infrastructure cloud computing (AWS/Azure/GCP) and container orchestration (Kubernetes/Docker).
- Deep expertise in identifying and solving performance bottlenecks in high-throughput distributed environments.
- Demonstrated ability to drive technical requirements for ambiguous problems where the path forward is not obvious.
- T-shaped engineering skills: depth in distributed systems and breadth across cloud infrastructure monitoring and software lifecycles.
Required Experience:
Staff IC
About Company
The Databricks Platform is the world’s first data intelligence platform powered by generative AI. Infuse AI into every facet of your business.