Data Architect


Job Location:

Pune - India

Monthly Salary: Not Disclosed
Experience Required: 5-6years
Posted on: 21 days ago
Vacancies: 1 Vacancy

Job Summary

Data Architect Databricks

Data Engineering & Pipelines Mid-Level Full-Time

Experience

5 8 Years

Level

Mid-Level

Employment Type

Full-Time

Location

Pune - Hybrid

Primary Stack

Databricks Apache Spark Delta Lake SQL

Domain

Data Engineering & Pipelines

About the Role

We are looking for a hands-on Data Architect with deep expertise in Databricks to design build and optimise enterprise-scale data platforms. You will own the end-to-end data engineering lifecycle from ingestion and transformation to serving while ensuring reliability scalability and governance across our lakehouse architecture.

You will collaborate closely with data engineers analytics engineers and product teams to translate business requirements into robust reusable data solutions on the Databricks Lakehouse Platform.

Key Responsibilities

Data Architecture & Design

Design and maintain the organisations lakehouse architecture using Databricks and Delta Lake.

Define data modelling standards (dimensional Data Vault 2.0 or medallion architecture) across Bronze Silver and Gold layers.

Architect scalable ingestion frameworks using structured and unstructured data sources (Kafka JDBC REST APIs cloud storage).

Own schema evolution strategy and ensure backward-compatibility across data assets.

Pipeline Development & Optimisation

Build and maintain production-grade ETL/ELT pipelines using PySpark Spark SQL and Databricks Workflows.

Implement Delta Live Tables (DLT) for declarative auto-scaling pipeline development.

Optimise Spark jobs for performance partitioning Z-ordering caching and cluster right-sizing.

Establish CI/CD practices for data pipelines using tools such as GitHub Actions Azure DevOps or Databricks Asset Bundles.

Data Governance & Quality

Implement Unity Catalog for data discovery lineage tracking fine-grained access control and compliance.

Define and enforce data quality rules using Great Expectations DLT expectations or equivalent frameworks.

Work with data governance teams to document metadata business glossary and data contracts.

Platform & Infrastructure

Manage Databricks workspace configuration: clusters pools secrets and access policies.

Collaborate with cloud and DevOps teams on infrastructure-as-code (Terraform) for Databricks on Azure / AWS / GCP.

Monitor platform health SLAs and cost using Databricks system tables and cloud-native monitoring tools.

Collaboration & Mentorship

Partner with data consumers (analysts data scientists ML engineers) to define SLAs and publish clean well-documented data products.

Review code and provide architectural guidance to junior engineers.

Contribute to and champion internal data engineering best practices runbooks and documentation.

Required Skills & Experience

Core Databricks & Spark

4 years of hands-on experience with Databricks (Unified Data Analytics Platform).

Strong proficiency in PySpark and Spark SQL for large-scale data transformation.

Deep knowledge of Delta Lake ACID transactions time travel OPTIMIZE VACUUM.

Experience with Databricks Workflows Jobs and Delta Live Tables (DLT).

Familiarity with Unity Catalog and Databricks governance features.

Data Engineering Fundamentals

Solid understanding of data modelling paradigms: dimensional modelling Data Vault or medallion architecture.

Experience designing and operating streaming pipelines (Structured Streaming Kafka Event Hubs or Kinesis).

Proficiency in SQL; experience with dbt is a strong plus.

Hands-on experience with cloud platforms: Azure (ADLS ADF) AWS (S3 Glue) or GCP (BigQuery GCS).

Software Engineering Practices

Version control with Git; experience with branching strategies and code review workflows.

Ability to write testable modular pipeline code with unit and integration tests.

Familiarity with CI/CD pipelines and infrastructure-as-code (Terraform preferred).

Nice to Have

Databricks Certified Data Engineer Associate or Professional certification.

Experience with data mesh or data product frameworks.

Exposure to ML pipelines MLflow or Feature Store on Databricks.

Knowledge of data cataloguing tools (Alation Collibra or Databricks Unity Catalog).

Experience with Apache Iceberg or Apache Hudi as alternative table formats.

Familiarity with real-time analytics or OLAP systems (Druid ClickHouse Redshift).

What We Offer

Competitive salary with performance-linked bonus.

Flexible / hybrid working arrangements.

Access to Databricks training and certification budget.

Collaborative engineering-first data culture with modern tooling.

Clear career progression path to Senior Data Architect or Data Platform Lead.

Comprehensive health wellness and retirement benefits.






Required Skills:

Data Architect Databricks Data Engineering & Pipelines Mid-Level Full-Time Experience 5 8 Years Level Mid-Level Employment Type Full-Time Location Hybrid - Pune Primary Stack Databricks Apache Spark Delta Lake SQL Domain Data Engineering & Pipelines About the Role We are looking for a hands-on Data Architect with deep expertise in Databricks to design build and optimise enterprise-scale data platforms. You will own the end-to-end data engineering lifecycle from ingestion and transformation to serving while ensuring reliability scalability and governance across our lakehouse architecture. You will collaborate closely with data engineers analytics engineers and product teams to translate business requirements into robust reusable data solutions on the Databricks Lakehouse Platform. Key Responsibilities Data Architecture & Design Design and maintain the organisations lakehouse architecture using Databricks and Delta Lake. Define data modelling standards (dimensional Data Vault 2.0 or medallion architecture) across Bronze Silver and Gold layers. Architect scalable ingestion frameworks using structured and unstructured data sources (Kafka JDBC REST APIs cloud storage). Own schema evolution strategy and ensure backward-compatibility across data assets. Pipeline Development & Optimisation Build and maintain production-grade ETL/ELT pipelines using PySpark Spark SQL and Databricks Workflows. Implement Delta Live Tables (DLT) for declarative auto-scaling pipeline development. Optimise Spark jobs for performance partitioning Z-ordering caching and cluster right-sizing. Establish CI/CD practices for data pipelines using tools such as GitHub Actions Azure DevOps or Databricks Asset Bundles. Data Governance & Quality Implement Unity Catalog for data discovery lineage tracking fine-grained access control and compliance. Define and enforce data quality rules using Great Expectations DLT expectations or equivalent frameworks. Work with data governance teams to document metadata business glossary and data contracts. Platform & Infrastructure Manage Databricks workspace configuration: clusters pools secrets and access policies. Collaborate with cloud and DevOps teams on infrastructure-as-code (Terraform) for Databricks on Azure / AWS / GCP. Monitor platform health SLAs and cost using Databricks system tables and cloud-native monitoring tools. Collaboration & Mentorship Partner with data consumers (analysts data scientists ML engineers) to define SLAs and publish clean well-documented data products. Review code and provide architectural guidance to junior engineers. Contribute to and champion internal data engineering best practices runbooks and documentation. Required Skills & Experience Core Databricks & Spark 4 years of hands-on experience with Databricks (Unified Data Analytics Platform). Strong proficiency in PySpark and Spark SQL for large-scale data transformation. Deep knowledge of Delta Lake ACID transactions time travel OPTIMIZE VACUUM. Experience with Databricks Workflows Jobs and Delta Live Tables (DLT). Familiarity with Unity Catalog and Databricks governance features. Data Engineering Fundamentals Solid understanding of data modelling paradigms: dimensional modelling Data Vault or medallion architecture. Experience designing and operating streaming pipelines (Structured Streaming Kafka Event Hubs or Kinesis). Proficiency in SQL; experience with dbt is a strong plus. Hands-on experience with cloud platforms: Azure (ADLS ADF) AWS (S3 Glue) or GCP (BigQuery GCS). Software Engineering Practices Version control with Git; experience with branching strategies and code review workflows. Ability to write testable modular pipeline code with unit and integration tests. Familiarity with CI/CD pipelines and infrastructure-as-code (Terraform preferred). Nice to Have Databricks Certified Data Engineer Associate or Professional certification. Experience with data mesh or data product frameworks. Exposure to ML pipelines MLflow or Feature Store on Databricks. Knowledge of data cataloguing tools (Alation Collibra or Databricks Unity Catalog). Experience with Apache Iceberg or Apache Hudi as alternative table formats. Familiarity with real-time analytics or OLAP systems (Druid ClickHouse Redshift). What We Offer Competitive salary with performance-linked bonus. Flexible / hybrid working arrangements. Access to Databricks training and certification budget. Collaborative engineering-first data culture with modern tooling. Clear career progression path to Senior Data Architect or Data Platform Lead. Comprehensive health wellness and retirement benefits.

Data Architect DatabricksData Engineering & Pipelines Mid-Level Full-TimeExperience5 8 YearsLevelMid-Level Employment TypeFull-TimeLocationPune - HybridPrimary StackDatabricks Apache Spark Delta Lake SQLDomainData Engineering & PipelinesAbout the RoleWe are looking for a hands-on Data Archit...