SRE Engineer
Dubai, DU
Apply for this job

🚀 Role: Site Reliability Engineer (Node.js)


Location: Hybrid / Remote (UK-based)

Tech Stack: Node.js | AWS | MongoDB | Docker | CI/CD | Prometheus | Python


🌟 Why This Role?


Looking to work at the intersection of DevOps, backend engineering, and real-time problem-solving? Here’s your chance to make a real impact in a high-scale cloud environment, keeping production systems fast, reliable, and resilient for thousands of users.

You’ll join a collaborative, tech-savvy team dedicated to making things just work better. From improving observability across microservices to responding to high-priority incidents, this is your platform to shape how scalable applications are delivered and supported.


🛠️ What You’ll Be Doing

  • 🔧 Fix and improve: Hunt down bugs in live Node.js microservices and make production more stable every day.
  • 🤝 Pair up with engineers: Collaborate with dev teams to sharpen code quality, boost resilience, and embed observability from the start.
  • ☁️ Own the cloud: Configure and manage cloud infrastructure (AWS), keeping everything humming at scale.
  • 📈 Watch the signals: Build better monitoring and alerting systems to catch issues before they escalate.
  • 🧠 Troubleshoot deeply: Solve complex technical puzzles and help guide others through them.
  • 📄 Automate everything: Write and maintain SOPs and automation scripts to reduce manual toil.
  • 🚨 Be the calm in the storm: Participate in the on-call rota and take ownership of live issues when they arise.


✅ What We’re Looking For

  • Solid experience debugging live Node.js applications and resolving production issues fast.
  • Background in building and supporting microservice-based applications.
  • Confidence working with MongoDB, AWS services, and containerisation tools like Docker or ECS.
  • Familiarity with infrastructure-as-code and CI/CD pipelines (CloudFormation, CodeBuild, etc.).
  • Comfort using monitoring/observability tools like Prometheus, NewRelic, Grafana, or DataDog.
  • Good grasp of scripting (Python or JS) for automation and tooling.
  • Clear thinking in the face of incidents—plus the drive to learn from them.

💡 Bonus Points For

  • Knowledge of REST, GraphQL, and async messaging systems.
  • Experience with Git workflows and CI/CD pipelines.
  • Understanding of SRE principles (SLIs, SLOs, error budgets, etc.).
  • Awareness of security and compliance (GDPR, privacy, risk management).
  • Clear communicator with a team-first attitude.


🙌 Why You'll Love It Here

  • You’ll work with brilliant engineers who care about quality, automation, and clean code.
  • You’ll have the freedom to shape infrastructure as we scale and evolve.
  • You’ll gain deep exposure to modern DevOps tooling, incident response strategy, and production engineering.
  • Your voice will matter—from tech choices to process improvements.


Apply direct or contact annie.palmer@wearenumi.com


Apply for this job
Powered by