Data Architect
Job Summary
Data Architect Databricks
Data Engineering & Pipelines Mid-Level Full-Time
Experience | 5 8 Years |
Level | Mid-Level |
Employment Type | Full-Time |
Location | Pune - Hybrid |
Primary Stack | Databricks Apache Spark Delta Lake SQL |
Domain | Data Engineering & Pipelines |
About the Role
We are looking for a hands-on Data Architect with deep expertise in Databricks to design build and optimise enterprise-scale data platforms. You will own the end-to-end data engineering lifecycle from ingestion and transformation to serving while ensuring reliability scalability and governance across our lakehouse architecture.
You will collaborate closely with data engineers analytics engineers and product teams to translate business requirements into robust reusable data solutions on the Databricks Lakehouse Platform.
Key Responsibilities
Data Architecture & Design
Design and maintain the organisations lakehouse architecture using Databricks and Delta Lake.
Define data modelling standards (dimensional Data Vault 2.0 or medallion architecture) across Bronze Silver and Gold layers.
Architect scalable ingestion frameworks using structured and unstructured data sources (Kafka JDBC REST APIs cloud storage).
Own schema evolution strategy and ensure backward-compatibility across data assets.
Pipeline Development & Optimisation
Build and maintain production-grade ETL/ELT pipelines using PySpark Spark SQL and Databricks Workflows.
Implement Delta Live Tables (DLT) for declarative auto-scaling pipeline development.
Optimise Spark jobs for performance partitioning Z-ordering caching and cluster right-sizing.
Establish CI/CD practices for data pipelines using tools such as GitHub Actions Azure DevOps or Databricks Asset Bundles.
Data Governance & Quality
Implement Unity Catalog for data discovery lineage tracking fine-grained access control and compliance.
Define and enforce data quality rules using Great Expectations DLT expectations or equivalent frameworks.
Work with data governance teams to document metadata business glossary and data contracts.
Platform & Infrastructure
Manage Databricks workspace configuration: clusters pools secrets and access policies.
Collaborate with cloud and DevOps teams on infrastructure-as-code (Terraform) for Databricks on Azure / AWS / GCP.
Monitor platform health SLAs and cost using Databricks system tables and cloud-native monitoring tools.
Collaboration & Mentorship
Partner with data consumers (analysts data scientists ML engineers) to define SLAs and publish clean well-documented data products.
Review code and provide architectural guidance to junior engineers.
Contribute to and champion internal data engineering best practices runbooks and documentation.
Required Skills & Experience
Core Databricks & Spark
4 years of hands-on experience with Databricks (Unified Data Analytics Platform).
Strong proficiency in PySpark and Spark SQL for large-scale data transformation.
Deep knowledge of Delta Lake ACID transactions time travel OPTIMIZE VACUUM.
Experience with Databricks Workflows Jobs and Delta Live Tables (DLT).
Familiarity with Unity Catalog and Databricks governance features.
Data Engineering Fundamentals
Solid understanding of data modelling paradigms: dimensional modelling Data Vault or medallion architecture.
Experience designing and operating streaming pipelines (Structured Streaming Kafka Event Hubs or Kinesis).
Proficiency in SQL; experience with dbt is a strong plus.
Hands-on experience with cloud platforms: Azure (ADLS ADF) AWS (S3 Glue) or GCP (BigQuery GCS).
Software Engineering Practices
Version control with Git; experience with branching strategies and code review workflows.
Ability to write testable modular pipeline code with unit and integration tests.
Familiarity with CI/CD pipelines and infrastructure-as-code (Terraform preferred).
Nice to Have
Databricks Certified Data Engineer Associate or Professional certification.
Experience with data mesh or data product frameworks.
Exposure to ML pipelines MLflow or Feature Store on Databricks.
Knowledge of data cataloguing tools (Alation Collibra or Databricks Unity Catalog).
Experience with Apache Iceberg or Apache Hudi as alternative table formats.
Familiarity with real-time analytics or OLAP systems (Druid ClickHouse Redshift).
What We Offer
Competitive salary with performance-linked bonus.
Flexible / hybrid working arrangements.
Access to Databricks training and certification budget.
Collaborative engineering-first data culture with modern tooling.
Clear career progression path to Senior Data Architect or Data Platform Lead.
Comprehensive health wellness and retirement benefits.
Required Skills:
Data Architect Databricks Data Engineering & Pipelines Mid-Level Full-Time Experience 5 8 Years Level Mid-Level Employment Type Full-Time Location Hybrid - Pune Primary Stack Databricks Apache Spark Delta Lake SQL Domain Data Engineering & Pipelines About the Role We are looking for a hands-on Data Architect with deep expertise in Databricks to design build and optimise enterprise-scale data platforms. You will own the end-to-end data engineering lifecycle from ingestion and transformation to serving while ensuring reliability scalability and governance across our lakehouse architecture. You will collaborate closely with data engineers analytics engineers and product teams to translate business requirements into robust reusable data solutions on the Databricks Lakehouse Platform. Key Responsibilities Data Architecture & Design Design and maintain the organisations lakehouse architecture using Databricks and Delta Lake. Define data modelling standards (dimensional Data Vault 2.0 or medallion architecture) across Bronze Silver and Gold layers. Architect scalable ingestion frameworks using structured and unstructured data sources (Kafka JDBC REST APIs cloud storage). Own schema evolution strategy and ensure backward-compatibility across data assets. Pipeline Development & Optimisation Build and maintain production-grade ETL/ELT pipelines using PySpark Spark SQL and Databricks Workflows. Implement Delta Live Tables (DLT) for declarative auto-scaling pipeline development. Optimise Spark jobs for performance partitioning Z-ordering caching and cluster right-sizing. Establish CI/CD practices for data pipelines using tools such as GitHub Actions Azure DevOps or Databricks Asset Bundles. Data Governance & Quality Implement Unity Catalog for data discovery lineage tracking fine-grained access control and compliance. Define and enforce data quality rules using Great Expectations DLT expectations or equivalent frameworks. Work with data governance teams to document metadata business glossary and data contracts. Platform & Infrastructure Manage Databricks workspace configuration: clusters pools secrets and access policies. Collaborate with cloud and DevOps teams on infrastructure-as-code (Terraform) for Databricks on Azure / AWS / GCP. Monitor platform health SLAs and cost using Databricks system tables and cloud-native monitoring tools. Collaboration & Mentorship Partner with data consumers (analysts data scientists ML engineers) to define SLAs and publish clean well-documented data products. Review code and provide architectural guidance to junior engineers. Contribute to and champion internal data engineering best practices runbooks and documentation. Required Skills & Experience Core Databricks & Spark 4 years of hands-on experience with Databricks (Unified Data Analytics Platform). Strong proficiency in PySpark and Spark SQL for large-scale data transformation. Deep knowledge of Delta Lake ACID transactions time travel OPTIMIZE VACUUM. Experience with Databricks Workflows Jobs and Delta Live Tables (DLT). Familiarity with Unity Catalog and Databricks governance features. Data Engineering Fundamentals Solid understanding of data modelling paradigms: dimensional modelling Data Vault or medallion architecture. Experience designing and operating streaming pipelines (Structured Streaming Kafka Event Hubs or Kinesis). Proficiency in SQL; experience with dbt is a strong plus. Hands-on experience with cloud platforms: Azure (ADLS ADF) AWS (S3 Glue) or GCP (BigQuery GCS). Software Engineering Practices Version control with Git; experience with branching strategies and code review workflows. Ability to write testable modular pipeline code with unit and integration tests. Familiarity with CI/CD pipelines and infrastructure-as-code (Terraform preferred). Nice to Have Databricks Certified Data Engineer Associate or Professional certification. Experience with data mesh or data product frameworks. Exposure to ML pipelines MLflow or Feature Store on Databricks. Knowledge of data cataloguing tools (Alation Collibra or Databricks Unity Catalog). Experience with Apache Iceberg or Apache Hudi as alternative table formats. Familiarity with real-time analytics or OLAP systems (Druid ClickHouse Redshift). What We Offer Competitive salary with performance-linked bonus. Flexible / hybrid working arrangements. Access to Databricks training and certification budget. Collaborative engineering-first data culture with modern tooling. Clear career progression path to Senior Data Architect or Data Platform Lead. Comprehensive health wellness and retirement benefits.