π Local Job Near You
Linux Systems Administrator (HPC)
Insight Global
π
Colorado Springs, United States
Location
Colorado Springs
Posted
June 10, 2026
Commute
Local Area
Local Opportunity Near You!
This job is in your area. Enjoy a short commute and work close to home.
Job Description
Job Description
Insight Global is looking for a Senior level Linux Systems administrator with strong HPC skills. This individual will be responsible for handling all L1/L2 operational support related to HPC (High Performance Computing) operations for Colorado Springs. They will support HPC business users at the OS level and troubleshooting issues related to HPC hardware, monitor SLURM and health of HPC, and monitor HPC cluster health (nodes, storage, interconnect, schedulers). They will handle all operational issues through SNOW, respond to alerts from monitoring tools (Nagios, Prometheus, Grafna), restart failed services and jobs where procedures exist, and coordinate with hardware vendor for hardware related issues.
Day to Day activities:
1.βMonitor SNOW tickets and perform basic triage
2.βPerform the storage checks (Quota and utilization)
3.βMonitor SLURM (queue, Job state) and escalate to the next level where necessary
4.βAttend regular standup meetings
5.βAna...
Insight Global is looking for a Senior level Linux Systems administrator with strong HPC skills. This individual will be responsible for handling all L1/L2 operational support related to HPC (High Performance Computing) operations for Colorado Springs. They will support HPC business users at the OS level and troubleshooting issues related to HPC hardware, monitor SLURM and health of HPC, and monitor HPC cluster health (nodes, storage, interconnect, schedulers). They will handle all operational issues through SNOW, respond to alerts from monitoring tools (Nagios, Prometheus, Grafna), restart failed services and jobs where procedures exist, and coordinate with hardware vendor for hardware related issues.
Day to Day activities:
1.βMonitor SNOW tickets and perform basic triage
2.βPerform the storage checks (Quota and utilization)
3.βMonitor SLURM (queue, Job state) and escalate to the next level where necessary
4.βAttend regular standup meetings
5.βAna...