Senior HPC AI Cluster Engineer

3 days ago
Job Type
Permanent
Work Pattern
Full-time
Work Location
Remote
Seniority
Senior
Education
Degree
Posted
22 May 2026 (3 days ago)

NVIDIA is looking for an experienced HPC-AI Engineer to join the Networking Clusters Solutions Infrastructure team. we are focused on building supercomputers and AI clusters based on groundbreaking technologies. We are looking for an outstanding engineer, be a key player to the most exciting computing hardware and software to contribute to the latest breakthroughs in artificial intelligence and GPU computing. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated computing and Deep Learning software and hardware platforms, and with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.

What you will be doing:

  • Design, implement and maintain large scale HPC/AI clusters with monitoring, logging and alerting

  • Manage Linux job/workload schedules and orchestration tools

  • Develop and maintain continuous integration and delivery pipelines

  • Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources

  • Deploy monitoring solutions for the servers, network and storage

  • Perform troubleshooting bottom up from bare metal, operating system, software stack and application level

  • Being a technical resource, develop, re-define and document standard methodologies to share with internal teams

  • Support Research & Development activities and engage in POCs/POVs for future improvements

What we need to see:

  • A degree in Computer Science, Engineering, or a related field and 8+ years of experience

  • Knowledge of HPC and AI solution technologies from CPU’s and GPU’s to high speed interconnects and supporting software

  • Experience with job scheduling workloads and orchestration tools such as Slurm, K8s

  • Excellent knowledge of Windows and Linux (Redhat/CentOS and Ubuntu) networking (sockets, firewalld, iptables, wireshark, etc.) and internals, ACLs and OS level security protection and common protocols e.g. TCP, DHCP, DNS, etc.

  • Experience with multiple storage solutions such as Lustre, GPFS, Weka.io. Familiarity with newer and emerging storage technologies.

  • Python programming and bash scripting experience.

  • Comfortable with automation and configuration management tools such as Jenkins, Ansible, Puppet/chef

  • Deep knowledge of Networking Protocols like InfiniBand, Ethernet

  • Deep understanding and experience with virtual systems (for example VMware, Hyper-V, KVM, or Citrix)

  • Familiarity with cloud computing platforms (e.g. AWS, Azure, Google Cloud)

Ways to stand out from the crowd:

  • Knowledge of CPU and/or GPU architecture

  • Knowledge of Kubernetes, container related microservice technologies

  • Experience with GPU-focused hardware/software (DGX, Cuda)

  • Experience with RDMA (InfiniBand or RoCE) fabrics

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Related Jobs

View all jobs

Senior HPC AI Cluster Engineer

Remote

Senior HPC AI Cluster Engineer

Remote

Senior HPC AI Cluster Engineer

Remote

Senior HPC AI Cluster Engineer

Remote

Senior HPC AI Cluster Engineer

Remote

Senior HPC and AI Network Software Architect

NVIDIA Switzerland
On-site

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Where to Advertise Cloud Computing Jobs in the UK (2026 Guide)

Where to advertise cloud computing jobs UK in 2026: the specialist boards and channels that reach AWS, Azure, GCP and cloud-native engineering talent. The candidate pool is large relative to other deep tech disciplines but highly segmented — cloud architects, DevOps engineers, platform engineers, FinOps specialists and cloud security professionals each occupy distinct communities with different job search behaviours, certification profiles and salary expectations. General job boards reach a broad audience but struggle to differentiate between these disciplines, producing high application volumes but low candidate quality for specialist cloud roles. This guide, published by CloudComputingJobs.co.uk, covers where to advertise cloud computing roles in the UK in 2026, how the main platforms compare, what employers should expect to pay, and what the data says about hiring across different role types.

Cloud Computing Jobs UK 2026: What to Expect Over the Next 3 Years

Cloud Computing Jobs UK 2026: salaries, hiring trends and the AWS, Azure and GCP skills shaping UK cloud careers over the next three years. Cloud computing is the infrastructure layer on which the modern digital economy runs — and the jobs market that has grown around it is one of the largest, most sustained, and most structurally resilient in the entire technology sector. But the cloud computing jobs market of 2026 looks quite different from the one that existed three years ago, and the next three years will bring further change at a pace that rewards those who understand the direction of travel. The migration phase that defined cloud hiring for much of the previous decade is largely complete for enterprise organisations. The question for most UK businesses is no longer whether to move to the cloud but how to operate, optimise, and secure what they have already built there — and how to integrate the wave of AI capability that is now being delivered primarily through cloud infrastructure. That shift has profound implications for which cloud skills are in demand, which roles are growing, and which are beginning to plateau. At the same time, new architectural patterns — multi-cloud, cloud-native, serverless, and the growing integration of edge computing with centralised cloud infrastructure — are creating entirely new categories of specialist expertise that employers are actively competing to hire. The cloud computing jobs market of 2026 is not contracting. It is evolving, and evolving in ways that create significant opportunity for job seekers who are building the right skills. This article breaks down what the UK cloud computing jobs market is likely to look like through to 2028 — covering the titles emerging right now, the technologies driving employer demand, the skills that will matter most, and how to position your career ahead of the curve.

New Cloud Computing Employers to Watch in 2026: UK and Global Companies Powering the Digital Economy

New Cloud Computing Employers to Watch in 2026: a UK and global shortlist of cloud providers and SaaS firms hiring AWS, Azure, GCP and cloud-native talent. Cloud computing is no longer just a backbone technology—it is now the engine of digital transformation, underpinning everything from AI and fintech to healthcare and government services. For professionals browsing CloudComputingJobs.co.uk, the biggest opportunities lie with new and fast-scaling employers that are investing heavily in infrastructure, platforms, and next-generation cloud services. In this article, we explore the new cloud computing employers to watch in 2026, focusing on UK-based startups, scale-ups, and global companies expanding their footprint across Britain. These organisations have recently secured funding, launched major projects, or won strategic contracts—clear signals of hiring growth.