Python, Pyspark, GCP Data Engineer

Zensar Technologies

Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: Yesterday

Vacancies: 1 Vacancy

Job Summary

Description

Key Responsibilities

Pipeline Development: Design develop and maintain end-to-end ETL/ELT pipelines using Python and PySpark.
Big Data Processing: Build large-scale data processing frameworks to handle structured and unstructured data ensuring high performance and reliability.
Cloud Infrastructure: Architect and manage data solutions within the GCP ecosystem focusing on cost-efficiency and security.
Data Modeling: Design and implement robust data warehouse models (Star/Snowflake schemas) and data lake architectures.
Optimization: Identify design and implement internal process improvements such as automating manual processes and optimizing data delivery for greater scalability.
Collaboration: Work closely with stakeholders to understand data requirements and translate them into technical specifications.

Responsibilities

Technical Qualifications

Core Programming: Strong proficiency in Python including experience with libraries like Pandas NumPy and logging frameworks.
Big Data: 3 years of hands-on experience with Apache Spark (PySpark) for distributed data processing.
GCP Ecosystem: Practical experience with Google Cloud services specifically:
BigQuery (Optimization Partitioning Clustering).
Cloud DataProc or Dataflow.
Cloud Storage (GCS) and Cloud Functions.
Cloud Composer (Apache Airflow) for orchestration.
Data Warehousing: Solid understanding of relational databases and SQL (PostgreSQL MySQL) as well as NoSQL environments.
DevOps & Tools: Experience with Git Docker and CI/CD pipelines. Familiarity with Terraform or other IaC tools is a significant plus.

Qualifications

Preferred Skills

Experience with real-time data streaming (e.g. Google Pub/Sub or Kafka).
Knowledge of data governance security and privacy compliance (GDPR/CCPA).
Experience in optimizing Spark jobs (shuffling partitioning and memory management).
Professional Google Cloud Data Engineer certification.

Soft Skills

Analytical Thinking: Ability to break down complex data problems into manageable technical tasks.
Communication: Strong verbal and written skills to interact with both technical and non-technical teams.
Adaptability: A self-starter who stays current with the evolving data engineering landscape.
Mentorship: Willingness to provide guidance and conduct code reviews for more junior team members.

Required Experience:

DescriptionKey ResponsibilitiesPipeline Development: Design develop and maintain end-to-end ETL/ELT pipelines using Python and PySpark.Big Data Processing: Build large-scale data processing frameworks to handle structured and unstructured data ensuring high performance and reliability.Cloud Infrastr...

Description

Key Responsibilities

Pipeline Development: Design develop and maintain end-to-end ETL/ELT pipelines using Python and PySpark.
Big Data Processing: Build large-scale data processing frameworks to handle structured and unstructured data ensuring high performance and reliability.
Cloud Infrastructure: Architect and manage data solutions within the GCP ecosystem focusing on cost-efficiency and security.
Data Modeling: Design and implement robust data warehouse models (Star/Snowflake schemas) and data lake architectures.
Optimization: Identify design and implement internal process improvements such as automating manual processes and optimizing data delivery for greater scalability.
Collaboration: Work closely with stakeholders to understand data requirements and translate them into technical specifications.

Responsibilities

Technical Qualifications

Core Programming: Strong proficiency in Python including experience with libraries like Pandas NumPy and logging frameworks.
Big Data: 3 years of hands-on experience with Apache Spark (PySpark) for distributed data processing.
GCP Ecosystem: Practical experience with Google Cloud services specifically:
BigQuery (Optimization Partitioning Clustering).
Cloud DataProc or Dataflow.
Cloud Storage (GCS) and Cloud Functions.
Cloud Composer (Apache Airflow) for orchestration.
Data Warehousing: Solid understanding of relational databases and SQL (PostgreSQL MySQL) as well as NoSQL environments.
DevOps & Tools: Experience with Git Docker and CI/CD pipelines. Familiarity with Terraform or other IaC tools is a significant plus.

Qualifications

Preferred Skills

Experience with real-time data streaming (e.g. Google Pub/Sub or Kafka).
Knowledge of data governance security and privacy compliance (GDPR/CCPA).
Experience in optimizing Spark jobs (shuffling partitioning and memory management).
Professional Google Cloud Data Engineer certification.

Soft Skills

Analytical Thinking: Ability to break down complex data problems into manageable technical tasks.
Communication: Strong verbal and written skills to interact with both technical and non-technical teams.
Adaptability: A self-starter who stays current with the evolving data engineering landscape.
Mentorship: Willingness to provide guidance and conduct code reviews for more junior team members.

Required Experience:

Apply Now

About Company

Zensar Technologies

At Zensar, we’re “experience-led everything”. We are committed to conceptualizing, designing, engineering, marketing, and managing digital solutions and experiences for over 130 leading enterprises. We are a company driven by a bold purpose: Together, we shape experiences for better f ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click