Position Overview: Server Specialist III
We are seeking a detail-oriented and dedicated Server Analyst to join our Network Operations Center (NOC) within the IT Server Operations team. This role is ideal for individuals who excel in structured, high-paced environments and possess hands-on experience in NOC workflows, server diagnostics, and infrastructure support. The position involves working fixed shifts, participating in an on-call rotation, and playing a critical role in system monitoring, alert management, and incident response. Success in this role hinges on maintaining system stability, clear cross-team communication, and driving operational efficiency through timely reporting and stakeholder collaboration.
Core Responsibilities
Monitoring, Alerting & Incident Response
- Monitor infrastructure health using tools such as SolarWinds Orion, Dynatrace, or equivalent platforms.
- Respond promptly and accurately to alerts, ensuring timely escalation and resolution within SLA parameters.
- Document incidents thoroughly to support Root Cause Analysis, post-mortems, and knowledge sharing.
- Participate in a structured on-call rotation for after-hours support.
- Execute maintenance window tasks, including application checkouts, maintenance mode validation, and alert suppression per schedule.
Server Operations & Troubleshooting
- Conduct hands-on diagnostics and remediation for Windows and Linux servers in both physical and virtual environments.
- Maintain up-to-date documentation of assets, configurations, and operational standards.
- Troubleshoot technical issues, manage support tickets, and coordinate with vendor teams for onsite assistance.
Reporting, Communication & Stakeholder Engagement
- Deliver concise updates during shift handoffs and operational briefings to ensure transparency and continuity.
- Collaborate with cross-functional teams to align on incident priorities, escalation protocols, and service impact.
- Work with stakeholders to define key performance indicators and tailor reporting and alerting solutions to specific application and infrastructure needs.
- Track and report operational metrics, highlighting areas for improvement and potential risks.
Security & Compliance
- Apply server security best practices and respond to vulnerability alerts promptly.
- Ensure all operational activities adhere to internal policies and external regulatory requirements.
Operational Excellence & Reliability
- Identify recurring issues and contribute to preventive strategies that enhance system reliability and reduce alert noise.
- Maintain and improve runbooks and escalation workflows to support consistent execution.
- Demonstrate high standards of punctuality, ownership, and accountability during assigned shifts.
Required Qualifications
- Minimum 3 years of experience in NOC or server operations roles.
- Proficiency in Windows Server and Linux environments.
- Hands-on experience with infrastructure monitoring and alerting tools.
- Familiarity with data center operations and hardware support.
- Solid understanding of networking fundamentals (TCP/IP, DNS, DHCP).
- Strong troubleshooting, documentation, and communication skills.
- Willingness to work fixed shifts and participate in an on-call rotation.
Preferred Qualifications
- Experience with SolarWinds Orion, Dynatrace, or similar observability platforms.
- Exposure to virtualization technologies such as VMware or Hyper-V.
- Familiarity with ITSM practices and ticketing systems (e.g., ServiceNow, Remedy).
- Relevant certifications (e.g., Microsoft, CompTIA Server+, Red Hat).
Top Daily Tasks
-
Orion Alert Management: Tune and manage alerts in SolarWinds Orion to ensure clarity and suppress noise during maintenance windows.
-
Email & Notification Triage: Prioritize incoming alerts and system notifications, escalate critical issues, and maintain NOC-wide awareness.
-
System Failovers & Failbacks: Execute and validate failover/failback procedures, ensuring service continuity and proper documentation.
-
NOC Phone Support: Provide responsive support for infrastructure incidents, service requests, and operational escalations.
-
DNS Entry Management: Create and update DNS records to reflect infrastructure changes, manage failovers, application statuses, and configure forwarding zones in Infoblox.