Sr. Site Reliability Engineer

Share this job

Culver City, CA

Come build, innovate, disrupt, and thrive!

KēSTA I.T. is actively seeking a Sr. Site Reliability Engineer for an immediate full-time opportunity with our industry leading client.

Are you on the lookout for a unique career opportunity that offers leadership, responsibility, and the chance to make a significant impact? If you're eager to contribute to a thriving and stable organization while maintaining your confidentiality, continue reading.

Overview

A leading technology company in the immersive content space is searching for a Senior Site Reliability Engineer to help scale a global platform that delivers their product. This role is ideal for someone who thrives in fast-paced environments, enjoys automating at scale, and is passionate about building fault-tolerant systems that serve millions of users reliably.

In this position, you’ll design and refine the backbone of our cloud infrastructure — ensuring uptime, observability, and security across multiple tenants and delivery pipelines. You’ll collaborate closely with software and DevOps teams to define reliability goals, establish best practices, and develop the monitoring and automation that keep our systems healthy around the clock.

Key Responsibilities

Architect and automate scalable cloud infrastructure leveraging Terraform and modern container technologies such as Kubernetes.
Optimize global CDN performance and end-to-end content delivery pipelines to improve streaming quality and latency.
Build and maintain observability frameworks that include defining SLIs/SLOs, implementing alerting systems, and ensuring actionable insights through Grafana and Prometheus dashboards.
Establish proactive capacity planning and load testing strategies to ensure reliability during rapid growth and high traffic periods.
Drive continuous improvement through incident management, root cause analysis, and post-incident reviews that strengthen system resiliency.
Participate in on-call rotations and help define escalation workflows that uphold 24/7 service availability.
Collaborate with engineering teams to embed reliability and security principles into every stage of the deployment lifecycle.
Mentor team members on operational readiness, reliability patterns, and scalable system design.
Contribute to internal standards for compliance and data protection (SOC 2, GDPR, ISO 27001, open-source licensing).

Qualifications

7+ years of hands-on experience in SRE or DevOps, focused on building reliable, distributed systems at scale.
Deep technical knowledge of AWS, CoreWeave, or other major cloud environments, with strong experience in container orchestration and Terraform-based automation.
Proven success managing multi-tenant architectures and applying best practices for data isolation, access control, and system hardening.
Skilled in monitoring, metrics, and tracing tools (e.g., Prometheus, Grafana) and experienced in using data-driven insights to enhance system performance.
Familiarity with security frameworks and automated auditing for compliance (SOC 2, GDPR, ISO 27001).
Strong leadership and mentorship capabilities; able to influence engineering culture and champion best practices around uptime, scalability, and operational excellence.

About KēSTA I.T.:

Our name says it all; KēSTA I.T. (Keys-to-I.T.) AND our people are our keys to our success! KēSTA I.T. is a premier Utah-based technical staffing and consulting services firm. We specialize in temporary and permanent placement of Software, Hardware, Network, Cloud, CRM/ERP, Data, End-User support, Web and Executive / leadership-based positions on a full time and consulting basis. If you're interested in a role where top performance is rewarded, personal time is valued, and excellence is demanded at every level we want to talk to you today!

Where do you want to go? We've got the keys! ~ KēSTA I.T.

WWW.KeSTAIT.COM

Apply for this job