Lead Data Engineer

Programmers.io

Job Location:

Sunnyvale, CA - USA

Monthly Salary: Not Disclosed

Posted on: 5 hours ago

Vacancies: 1 Vacancy

Job Summary

Job Description:

Role Overview

We are seeking a highly skilled and strategic Lead Data Engineer with strong expertise in PySpark Apache Iceberg Trino and modern Data Lakehouse architectures. The ideal candidate will be responsible for designing and driving enterprise-scale data platforms that enable analytics AI/ML and business intelligence across global organizations.

This is a strategic customer-facing role requiring strong technical leadership architecture expertise stakeholder management and the ability to influence data transformation initiatives.

Key Responsibilities

Lead the architecture design and implementation of large-scale distributed data platforms.

Build and optimize high-performance data pipelines using PySpark and distributed computing frameworks.

Design and manage Data Lakehouse solutions using Apache Iceberg for schema evolution time travel partition optimization and data governance.

Architect and optimize federated query solutions using Trino across multiple data sources.

Drive enterprise data migration and modernization initiatives from traditional warehouses to Lakehouse architectures.

Partner with business stakeholders product owners architects and customer teams to translate business requirements into scalable technical solutions.

Establish best practices for data modelling performance tuning security governance and observability.

Mentor a team of data engineers and provide technical leadership across delivery streams.

Evaluate and incorporate emerging technologies in Data Engineering Analytics and AI.

Support pre-sales discussions solution proposals estimations and customer presentations.

Technical Skills

10 years of experience in Data Engineering and Big Data ecosystems.

Expert knowledge of PySpark and Spark SQL.

Strong hands-on experience with Apache Iceberg.

Strong experience with Trino (Presto) query engine.

Experience building large-scale batch and near-real-time pipelines.

Strong SQL skills and query optimization expertise.

Experience with Data Lake technologies and cloud-based analytics platforms.

Knowledge of data modelling and distributed storage concepts.

Experience with orchestration tools such as Airflow or equivalent.

Experience working with file formats such as Parquet ORC and Avro.

Exposure to CI/CD Git DevOps and Infrastructure as Code practices.

Experience in one or more cloud platforms:

AWS (EMR Glue S3 Athena Lake Formation)

Azure (Databricks Data Factory ADLS)

GCP (Dataproc BigQuery GCS)

Leadership & Strategic Expectations

Ability to engage with senior customer stakeholders.

Drive technical roadmaps and platform modernization strategies.

Lead architecture reviews and governance forums.

Identify opportunities for automation optimization and AI-driven solutions.

Strong communication and presentation skills.

Ability to influence decisions across engineering product and business teams.

Job Description: Role Overview We are seeking a highly skilled and strategic Lead Data Engineer with strong expertise in PySpark Apache Iceberg Trino and modern Data Lakehouse architectures. The ideal candidate will be responsible for designing and driving enterprise-scale data platforms that ena...