Senior Cloud DevOps Engineer

Share this job

Sydney, NSW

This global, highly regarded travel technology organisation is a market leader in delivering innovative and cost-effective solutions across corporate, events, and leisure travel sectors. With decades of sustained growth and a strong reputation for combining personalised service with sophisticated in-house technology, the business continues to expand its engineering capability to support increasingly complex hybrid infrastructure environments.

This is not a traditional DevOps or cloud automation role.

The environment is highly operational and troubleshooting-heavy, supporting critical production systems where issues are often ambiguous, intermittent, and spread across multiple infrastructure and application layers. The team is specifically looking for an engineer who enjoys understanding what is actually happening inside systems and can systematically isolate difficult production issues under pressure.

The focus of the role is deep systems troubleshooting, runtime diagnostics, Linux investigation, Kubernetes support, JVM-based systems, networking, and cross-layer production problem solving rather than purely CI/CD or provisioning-focused engineering.

Key Responsibilities:

Investigate and resolve complex production issues across infrastructure, network, database, JVM, and application layers
Perform runtime diagnostics including process investigation, memory analysis, thread inspection, and network tracing
Troubleshoot Linux-based systems and Kubernetes environments in production
Support hybrid infrastructure environments across AWS and on-premise platforms
Diagnose intermittent failures, latency spikes, resource exhaustion, and dependency bottlenecks
Investigate JVM-based application behaviour including memory, garbage collection, threading, and runtime performance
Support and troubleshoot Apache, MySQL, and Java application server environments
Work across networking components including VPNs, firewalls, DNS, proxies, and load balancers
Develop and maintain automation and operational tooling using Python and Bash
Manage and optimise Docker and Kubernetes environments
Maintain CI/CD tooling including Jenkins and Git
Collaborate closely with engineering teams during incident investigation and root cause analysis
Contribute to operational improvement initiatives focused on reliability, scalability, and system stability
Participate in architectural and operational discussions around hybrid infrastructure and platform modernisation

Desired Skills and Attributes:

Strong troubleshooting mindset with ability to systematically isolate root cause
Advanced Linux administration and systems-level investigation capability
Strong understanding of runtime behaviour across infrastructure and application layers
Experience troubleshooting Kubernetes environments in production
Experience supporting JVM-based systems and distributed applications
Strong understanding of TCP/IP networking fundamentals and dependency flows
Experience investigating performance issues, intermittent failures, and latency-related problems
Ability to work from first principles under ambiguity rather than relying purely on dashboards or runbooks
Solid AWS infrastructure experience
Proficiency in Python, Bash, or similar scripting languages
Experience with Docker and containerised environments
Experience with CI/CD tooling such as Jenkins and Git
Exposure to virtualisation technologies such as VMware
Strong communication skills across technical and non-technical teams
Curiosity and drive to deeply understand how systems behave under load and failure conditions

Highly Regarded:

Experience with Kafka or distributed messaging systems
Experience with low-level Linux tooling such as strace, tcpdump, lsof, iostat, vmstat or similar
Experience troubleshooting cross-layer production issues spanning infrastructure, network, JVM, database, and application services
Experience working within hybrid cloud and on-premise enterprise environments

Why Join?

You will be part of a collaborative engineering environment where autonomy, ownership, and technical curiosity are highly valued. The organisation promotes a pragmatic approach to problem-solving and encourages engineers to deeply understand the systems they support rather than simply following predefined operational paths.

This is an opportunity to work on genuinely complex production environments where troubleshooting capability and systems thinking are highly respected. You will work closely with experienced engineering and infrastructure teams on large-scale hybrid platforms supporting critical business operations globally.

With flexible working arrangements and a strong focus on work-life balance, this role offers the opportunity to work on meaningful technical challenges while continuing to deepen your operational and troubleshooting expertise.

Apply now and one of the White Bay team will reach out to you very soon.

Never miss out on job alerts or interview tips and tricks by following the White Bay LinkedIn page https://www.linkedin.com/company/whitebay

Apply for this job