Share this job
Senior Cloud DevOps Engineer
Sydney, NSW
Apply for this job

This global, highly regarded travel technology organisation is a market leader in delivering innovative and cost-effective solutions across corporate, events, and leisure travel sectors. With decades of sustained growth and a strong reputation for combining personalised service with sophisticated in-house technology, the business continues to expand its engineering capability to support increasingly complex hybrid infrastructure environments.


This is not a traditional DevOps or cloud automation role.


The environment is highly operational and troubleshooting-heavy, supporting critical production systems where issues are often ambiguous, intermittent, and spread across multiple infrastructure and application layers. The team is specifically looking for an engineer who enjoys understanding what is actually happening inside systems and can systematically isolate difficult production issues under pressure.


The focus of the role is deep systems troubleshooting, runtime diagnostics, Linux investigation, Kubernetes support, JVM-based systems, networking, and cross-layer production problem solving rather than purely CI/CD or provisioning-focused engineering.


Key Responsibilities:

  • Investigate and resolve complex production issues across infrastructure, network, database, JVM, and application layers
  • Perform runtime diagnostics including process investigation, memory analysis, thread inspection, and network tracing
  • Troubleshoot Linux-based systems and Kubernetes environments in production
  • Support hybrid infrastructure environments across AWS and on-premise platforms
  • Diagnose intermittent failures, latency spikes, resource exhaustion, and dependency bottlenecks
  • Investigate JVM-based application behaviour including memory, garbage collection, threading, and runtime performance
  • Support and troubleshoot Apache, MySQL, and Java application server environments
  • Work across networking components including VPNs, firewalls, DNS, proxies, and load balancers
  • Develop and maintain automation and operational tooling using Python and Bash
  • Manage and optimise Docker and Kubernetes environments
  • Maintain CI/CD tooling including Jenkins and Git
  • Collaborate closely with engineering teams during incident investigation and root cause analysis
  • Contribute to operational improvement initiatives focused on reliability, scalability, and system stability
  • Participate in architectural and operational discussions around hybrid infrastructure and platform modernisation


Desired Skills and Attributes:

  • Strong troubleshooting mindset with ability to systematically isolate root cause
  • Advanced Linux administration and systems-level investigation capability
  • Strong understanding of runtime behaviour across infrastructure and application layers
  • Experience troubleshooting Kubernetes environments in production
  • Experience supporting JVM-based systems and distributed applications
  • Strong understanding of TCP/IP networking fundamentals and dependency flows
  • Experience investigating performance issues, intermittent failures, and latency-related problems
  • Ability to work from first principles under ambiguity rather than relying purely on dashboards or runbooks
  • Solid AWS infrastructure experience
  • Proficiency in Python, Bash, or similar scripting languages
  • Experience with Docker and containerised environments
  • Experience with CI/CD tooling such as Jenkins and Git
  • Exposure to virtualisation technologies such as VMware
  • Strong communication skills across technical and non-technical teams
  • Curiosity and drive to deeply understand how systems behave under load and failure conditions


Highly Regarded:

  • Experience with Kafka or distributed messaging systems
  • Experience with low-level Linux tooling such as strace, tcpdump, lsof, iostat, vmstat or similar
  • Experience troubleshooting cross-layer production issues spanning infrastructure, network, JVM, database, and application services
  • Experience working within hybrid cloud and on-premise enterprise environments



Why Join?


You will be part of a collaborative engineering environment where autonomy, ownership, and technical curiosity are highly valued. The organisation promotes a pragmatic approach to problem-solving and encourages engineers to deeply understand the systems they support rather than simply following predefined operational paths.


This is an opportunity to work on genuinely complex production environments where troubleshooting capability and systems thinking are highly respected. You will work closely with experienced engineering and infrastructure teams on large-scale hybrid platforms supporting critical business operations globally.


With flexible working arrangements and a strong focus on work-life balance, this role offers the opportunity to work on meaningful technical challenges while continuing to deepen your operational and troubleshooting expertise.


Apply now and one of the White Bay team will reach out to you very soon.


Never miss out on job alerts or interview tips and tricks by following the White Bay LinkedIn page https://www.linkedin.com/company/whitebay

Apply for this job
Powered by