Lighthouse Technology Services is partnering with our client to fill their Solution Architect - Mythos SRE position! This is a 12 - 18 month contract that can be remote in the United States. This role will be a W2 employee of Lighthouse Technology Services. No C2C or subcontracting arrangements will be considered.
What You'll Be Doing
- Lead the design and implementation of comprehensive reliability, scalability, and operational architecture for the AI platform across Azure cloud and co-location environments, ensuring alignment with enterprise SRE principles and infrastructure standards
- Architect solutions for reliability and resilience patterns including high availability, failover, disaster recovery, and geo-distribution strategies for AI workloads and model-serving infrastructure
- Design and guide observability and telemetry frameworks that provide visibility into system health, model performance, drift detection, and risk indicators aligned with AI governance requirements
- Collaborate with domain leadership and enterprise architects to translate conceptual and logical designs into detailed physical solution architecture that meets business capabilities and financial targets
- Establish and implement SRE practices including SLIs, SLOs, error budgets, and operational readiness frameworks while guiding automation patterns and infrastructure-as-code deployment pipelines
- Partner with Agile teams throughout the SDLC to validate architecture decisions, ensure compliance with enterprise standards, and support the delivery of scalable, resilient AI platform operations
- Drive performance optimization initiatives for model-serving and AI workloads while identifying and evaluating emerging technologies and trends that could impact the domain
- Facilitate governance activities and collaborate with stakeholders across infrastructure, platform engineering, and observability teams to ensure consistent adoption of architecture patterns
What You'll Need to Have
- 5+ years of solution architecture or software engineering experience with demonstrated ability to design and integrate applications using modern architecture principles and patterns
- Proven expertise in Service Reliability Engineering with hands-on experience establishing SLIs, SLOs, error budgets, and operational readiness frameworks
- Strong technical proficiency across multiple programming languages and cloud technologies (Azure experience highly preferred)
- Experience architecting for reliability, scalability, and resilience including high availability, disaster recovery, and performance optimization strategies
- Solid understanding of observability, telemetry, and monitoring frameworks with ability to implement comprehensive system health and performance visibility
- Demonstrated ability to work in Agile environments and effectively communicate complex architectural concepts to stakeholders at all levels of the organization
- Experience with infrastructure-as-code, automation patterns, and deployment pipeline design
- Industry-recognized certifications in cloud technologies or programming languages preferred
Pay Range: $65 - $80/hr
Questions about any of our jobs? Email us at recruiting@lhtservices.com
View all of our open jobs here: jobs.lhtservices.com