A global technology business is continuing to strengthen its infrastructure capability as its platforms become more complex, distributed and business critical. The environment spans cloud and on-premises systems, with a strong focus on reliability, operational stability, automation and deep production troubleshooting.
This is not a standard DevOps role focused mainly on CI/CD, cloud provisioning or dashboard monitoring. The team is looking for an engineer who enjoys getting under the hood of systems, breaking down complex production issues and identifying what is actually causing them across Linux, network, application, database and infrastructure layers.
Key Responsibilities
- Investigate complex production issues across Linux, cloud, network, database and application environments.
- Troubleshoot performance issues where the root cause is not immediately obvious from logs or monitoring.
- Diagnose issues involving latency, IO wait, networking, JVM behaviour, database performance and dependency bottlenecks.
- Support hybrid infrastructure across AWS and on-premises environments.
- Work with Kubernetes, Docker and containerised production workloads.
- Support Java application environments including Apache, Tomcat, WildFly or similar.
- Develop automation and scripting to reduce manual operational tasks.
- Maintain and improve CI/CD and operational tooling including Jenkins, Git and related platforms.
- Work with databases including MySQL and other relational technologies.
- Collaborate with development, infrastructure and security teams during incidents and ongoing improvements.
- Contribute to system reliability, operational maturity and long-term platform stability.
Desired Skills and Experience
- Strong Linux administration and production troubleshooting experience.
- Experience diagnosing issues across infrastructure, networking and application layers.
- Hands-on experience with Kubernetes and Docker in production environments.
- Strong understanding of TCP/IP networking, DNS, VPNs, load balancers and firewalls.
- Experience supporting JVM-based or Java application environments.
- Scripting experience with Python, Bash or similar.
- Experience with AWS or hybrid cloud environments.
- Exposure to Puppet, Ansible, Terraform or similar automation/configuration tools.
- Experience with monitoring and observability tools such as Grafana, Prometheus, ELK, Splunk or similar.
- Comfortable working through ambiguous production issues without relying purely on runbooks.
- Strong communication skills and the ability to explain technical issues clearly.
Perks of the Role
- Travel discounts
- Perkbox - Retail, Lifestyle, Entertainment, Health & Wellness discounts
- Training and Development opportunities - online lessons and certificates
- Annual Volunteer Day
- x2 Wellness/Chillout Days
- Blended work arrangements with hybrid WFH flexibility
- 2 Weeks extra leave - Purchase Leave
- Paid Parental Leave
- Sonder - EAP Platform
Why Join?
You will join a pragmatic engineering environment where ownership, curiosity and practical problem-solving are highly valued. The work is technically varied, operationally important and well suited to someone who enjoys understanding how systems behave in production rather than only building or deploying them.
This is an opportunity to work across a complex hybrid environment, contribute to meaningful reliability improvements and be part of a team that values engineers who can think independently, investigate deeply and improve how systems are operated over time.
Apply now and one of the White Bay team will reach out to you very soon.
Never miss out on job alerts or interview tips and tricks by following the White Bay LinkedIn page https://www.linkedin.com/company/whitebay