We are seeking a SRE Infrastructure Resource having 8 years ofprofessional experience ensuring the reliability scalability and performance of Google Cloud-based services through automation monitoring and proactive engineering. Key responsibilities include managing infrastructure as code (Terraform) optimizing GKE/Kubernetes incident response and implementing SLIs/SLOs to minimize manual toil.
This role requires close collaboration with cross functional teams adherence to DevOps and Agile practices and ownership of service quality and delivery.
Key Responsibilities
GCP Infrastructure Management: Design deploy and maintain robust infrastructure components including VPCs Compute Engine GKE (Kubernetes) and storage solutions.
Automation & IaC: Utilize Terraform or Deployment Manager to manage cloud resources and build CI/CD pipelines to automate deployments. Minimizing manual repetitive tasks by developing automation scripts and custom tools to streamline deployments and operations.
Observability & Incident Management: Develop monitoring alerting and logging systems (e.g. Cloud Monitoring Prometheus Grafana). Act as primary on-call to troubleshoot production incidents.
Incident Management: Serving as a first responder for system outages and conducting deep-dive root cause analysis (post-mortems) to prevent recurrence
CI/CD Pipeline Management: Designing and supporting automated deployment pipelines using Jenkins ArgoCD Artifactory DevSecOps GitLab CI or GitHub Actions
Reliability Engineering: Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) - Latency Traffic Errors and Saturation
Optimization & Security: Proactively optimize infrastructure for cost performance and security compliance.
Site Reliability Engineer Google Cloud Engine AI SRE at Google: Focus specifically on AI workload health and GCE visibility
Mandatory Technical Skills & Competencies
Experience: 8 years in SRE DevOps or systems engineering specifically with Google Cloud Platform.
Technical Skills: Deep knowledge of Linux Kubernetes (GKE) networking (VPCs CDNs) and containerization.
Programming: Proficiency in scripting/programming languages like Python Go or Shell.
Methodologies: Strong understanding of GitOps CI/CD pipelines and SRE principles (error budgets toil reduction)
Strong troubleshooting skills across the full stack (network OS application).
Ability to balance system stability with the need for rapid deployment.
Observability Tools: Experience implementing monitoring and logging stacks like Prometheus Grafana or Google Cloud Operations Suite
Excellent collaboration skills to work with development teams for service ownership
Soft Skills
Strong problem-solving and analytical skills
Clear communication with technical and non technical stakeholders
Ownership mindset and production grade engineering discipline
Ability to work independently and within cross functional teams
About Next Gen Software Solutions LLC:
Next Gen Software Solutions is a trusted provider of IT Staffing and consulting services dedicated to empowering businesses with cutting-edge technology solutions and exceptional talent. We specialize in delivering tailored IT consulting services innovative software solutions and connecting businesses with highly skilled IT professionals. Founded and led by a dedicated U.S. Army solider Next Gen Software Solutions is deeply rooted in the core values of integrity discipline commitment and experience-principles that guide every aspect of our operations.
Equal Employment Opportunity Statement:
Next Gen Software Solutions LLC is an Equal Opportunity Employer. We are committed to fostering an inclusive and diverse workplace where all employees and applicants are treated respect and dignity. We do not discriminate based on race colour religion sex (including pregnancy sexual orientation or gender identity) national origin age genetic information veteran status or any other legally protected characteristic under applicable federal state or local laws.
Role: SRE Infrastructure Engineer Locations: SFO CA (5 Days Onsite) Duration: Long term Employment Type: Contract W2 Job Description: We are seeking a SRE Infrastructure Resource having 8 years of professional experience ensuring the reliability scalability and performance of Google Cloud-based...
Role: SRE Infrastructure Engineer
Locations: SFO CA (5 Days Onsite)
Duration: Long term
Employment Type: Contract W2
Job Description:
We are seeking a SRE Infrastructure Resource having 8 years ofprofessional experience ensuring the reliability scalability and performance of Google Cloud-based services through automation monitoring and proactive engineering. Key responsibilities include managing infrastructure as code (Terraform) optimizing GKE/Kubernetes incident response and implementing SLIs/SLOs to minimize manual toil.
This role requires close collaboration with cross functional teams adherence to DevOps and Agile practices and ownership of service quality and delivery.
Key Responsibilities
GCP Infrastructure Management: Design deploy and maintain robust infrastructure components including VPCs Compute Engine GKE (Kubernetes) and storage solutions.
Automation & IaC: Utilize Terraform or Deployment Manager to manage cloud resources and build CI/CD pipelines to automate deployments. Minimizing manual repetitive tasks by developing automation scripts and custom tools to streamline deployments and operations.
Observability & Incident Management: Develop monitoring alerting and logging systems (e.g. Cloud Monitoring Prometheus Grafana). Act as primary on-call to troubleshoot production incidents.
Incident Management: Serving as a first responder for system outages and conducting deep-dive root cause analysis (post-mortems) to prevent recurrence
CI/CD Pipeline Management: Designing and supporting automated deployment pipelines using Jenkins ArgoCD Artifactory DevSecOps GitLab CI or GitHub Actions
Reliability Engineering: Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) - Latency Traffic Errors and Saturation
Optimization & Security: Proactively optimize infrastructure for cost performance and security compliance.
Site Reliability Engineer Google Cloud Engine AI SRE at Google: Focus specifically on AI workload health and GCE visibility
Mandatory Technical Skills & Competencies
Experience: 8 years in SRE DevOps or systems engineering specifically with Google Cloud Platform.
Technical Skills: Deep knowledge of Linux Kubernetes (GKE) networking (VPCs CDNs) and containerization.
Programming: Proficiency in scripting/programming languages like Python Go or Shell.
Methodologies: Strong understanding of GitOps CI/CD pipelines and SRE principles (error budgets toil reduction)
Strong troubleshooting skills across the full stack (network OS application).
Ability to balance system stability with the need for rapid deployment.
Observability Tools: Experience implementing monitoring and logging stacks like Prometheus Grafana or Google Cloud Operations Suite
Excellent collaboration skills to work with development teams for service ownership
Soft Skills
Strong problem-solving and analytical skills
Clear communication with technical and non technical stakeholders
Ownership mindset and production grade engineering discipline
Ability to work independently and within cross functional teams
About Next Gen Software Solutions LLC:
Next Gen Software Solutions is a trusted provider of IT Staffing and consulting services dedicated to empowering businesses with cutting-edge technology solutions and exceptional talent. We specialize in delivering tailored IT consulting services innovative software solutions and connecting businesses with highly skilled IT professionals. Founded and led by a dedicated U.S. Army solider Next Gen Software Solutions is deeply rooted in the core values of integrity discipline commitment and experience-principles that guide every aspect of our operations.
Equal Employment Opportunity Statement:
Next Gen Software Solutions LLC is an Equal Opportunity Employer. We are committed to fostering an inclusive and diverse workplace where all employees and applicants are treated respect and dignity. We do not discriminate based on race colour religion sex (including pregnancy sexual orientation or gender identity) national origin age genetic information veteran status or any other legally protected characteristic under applicable federal state or local laws.