Sr Systems Engineer (Data Center Operations)
Job Summary
At Cadence we hire and develop leaders and innovators who want to make an impact on the world of technology.
Cadence is a pivotal leader in electronic design building upon more than 30 years of computational softwareexpertise. The company applies its underlying Intelligent System Design strategy to deliver software hardware and IP that turn design concepts into reality.
Cadence customers are the worlds most innovative companies delivering extraordinary electronic products from chips to boards to systems for the most dynamic market applications including consumer hyperscale computing 5G communications automotive aerospace industrial and health.
At Cadence we hire and develop leaders and innovators who want to make an impact on the world of technology.
Job Title:Sr Systems Engineer(DataCenter Operations)
Location:Munich Germany
Reports to:IT Group Director
JobOverview:
The Data Center Operations Engineer plays a critical role inmaintainingand expanding Cadences global data center infrastructure with a strong focus on Linux-based systems and GPU server this hands-on role you will ensure the reliability performance and scalability ofcompute network and storage platforms that underpin some of the worlds most advanced electronic design workloads. Working closely with global infrastructure development and operations teams you will drive everything from daily health monitoring and incident resolution to full GPU cluster bring-up and large-scale hardware deployments.
JobResponsibilities:
Deploy andmaintainLinux-based compute GPU and storage infrastructure across datacenterenvironments ensuring high availability and consistent performance.
Configure and bring up InfiniBand fabric and GPU clusters including switch configuration subnet management and end-to-end validation testing.
Install rack label and cable server hardware including CPUs memory NICs HDDs and RAID components in line with approved design specifications and quality standards.
Troubleshoot and resolve complex operational issues across Linux systems GPU platforms networking equipment and storage infrastructure.
Conduct daily health checks of systems and infrastructure components proactively identifying and mitigating risks before they affect service delivery.
Monitor the data center environment using established alerting frameworks escalate issues appropriately and drivetimelyservice restoration in line with SLAs.
Coordinate with vendors and onsite staff for hardware delivery diagnostics replacement and warranty fulfilment.
Maintainaccurateoperational documentation system configurations and runbooks to support consistency and knowledge sharing across the team.
Participate in an on-call rotation and provide on-site or remote support duringmaintenancewindows and operational incidents.
Collaborate with global infrastructure and operations teams to support data center builds migrations refreshprogrammes and process improvement initiatives.
JobQualifications:
Bachelors degree in Computer Science Engineering Information Technology or equivalent practical experience.
36 years of hands-on experience in Linux system administration troubleshooting and performance validation.
Proficiencywith Linux command-line tools and shell scripting (Bash or equivalent).
Experience with cluster bring-up GPU server deployment driver installation and system-level configuration.
Hands-on experience setting up and validating GPU servers in clustered environments including end-to-end GPU testing in InfiniBand-based clusters.
Working knowledge of InfiniBand networking including switch configuration and subnet management.
Solid understanding of networking fundamentals including the OSI model and TCP/IP protocol suite (IP ARP ICMP TCP UDP).
Experience installing configuring and troubleshooting routers switches and terminal servers for out-of-band management.
Familiarity withfibreand copper cabling in IP and SAN environments.
Strongorganisationalskills with meticulous attention to detail in data center environments.
Clear verbal and written communication skills with the ability to work effectively across cross-functional and global teams.
Additional Skills/Preferences:
Experience supporting HPC AI or large-scale GPU environments.
Exposure to datacentermonitoring and alerting platforms.
Experience documenting operational processes andmaintainingtechnical runbooks.
Familiarity with large-scale datacenterbuildouts or refresh programmes.
Cadence is committed to equal employmentopportunityand employment equity throughout all levels of the organization. We strive to attract a qualified and diverse candidatepool andencourage diversity and inclusion in the workplace.
Were doing work that matters. Help us solve what others cant.
Required Experience:
Senior IC
About Company
Do you want to shape the future of technology? Cadence is leading the charge to solve some of technology’s toughest challenges. We work with the world’s most innovative companies, across a growing range of industries. Major trends that you hear about everyday – like artificial intell ... View more