Are you a seasoned IT architect with a passion for building resilient, scalable, and fault-tolerant systems? We are seeking a Senior Principal Architect to lead the design, implementation, and governance of enterprise-level resilient IT architectures. This role offers the opportunity to drive innovation in mission-critical systems, collaborate across diverse teams, and ensure that our systems are robust, highly available, and always prepared for unforeseen challenges. If you thrive in high-impact environments and have expertise in chaos engineering, cloud technologies, and disaster recovery, we want to hear from you!
About
We are a leading innovator in manufacturing, committed to driving the next generation of enterprise technology solutions. We focus on delivering excellence through cutting-edge systems that push the boundaries of scalability, availability, and resiliency. Join our team and help shape the future of IT infrastructure at scale!
What You'll Do:
-
End-to-End Architecture Leadership: Lead the design and deployment of resilient, high-availability IT architectures across cloud and on-premises systems.
-
Cloud & On-Prem Infrastructure Expertise: Create and review robust solutions in both cloud and on-prem environments, ensuring systems can handle mission-critical operations without fail.
-
Chaos Engineering: Spearhead chaos engineering initiatives to proactively identify system vulnerabilities and stress-test infrastructure for optimal resilience.
-
Monitoring & Alerting: Partner with teams to evolve and establish comprehensive monitoring and alerting standards to ensure rapid detection and response to system anomalies.
-
Resiliency Architecture Reviews: Represent the IT Resiliency Office during the Architectural Review Board, contributing to the company’s overarching resiliency strategy.
-
Incident Response & Recovery: Lead efforts to integrate resiliency insights into post-incident processes, driving continuous improvements in our disaster recovery plans.
-
Stakeholder Management: Collaborate across teams to align resiliency efforts, prioritize recovery strategies, and ensure a cohesive organizational approach to IT reliability.
-
Third-Party Solutions Evaluation: Establish frameworks for continuous assessment of third-party hosted solutions, ensuring they meet our high standards for system integration and resiliency.
-
Reporting & Documentation: Develop regular reporting on resilience activities, risks, and improvement initiatives for the leadership team, ensuring transparency and alignment across the organization.
What We’re Looking For:
Education: Bachelor’s degree or equivalent experience in Computer Science, Information Technology, Engineering, or related field.
Experience:
- 15+ years in systems architecture and engineering, with a focus on IT resiliency, disaster recovery, and high-availability systems.
- 5+ years of experience leading technical teams or serving as a hands-on Technical Manager, delivering complex projects to completion.
- Proven experience architecting and deploying large-scale enterprise systems, ensuring system uptime and data integrity under various operational conditions.
- Deep understanding of multi-AZ and multi-region cloud platforms, including how to design for resilience and support disaster recovery strategies.
- Technical Expertise:
- Hands-on experience with Chaos Engineering principles and practices, including designing and conducting experiments to validate system resilience.
- Proficiency in designing resilient services on SaaS, PaaS, and IaaS platforms, with practical knowledge of leading public cloud platforms.
- Strong skills in system observability, leveraging tools for real-time insights into system performance and proactive issue resolution.
- Experience with both cloud-based and on-premises infrastructure solutions and designing systems for high availability, scalability, and resilience.
- Leadership & Communication:
- Excellent communication skills, with the ability to engage stakeholders at all levels and articulate complex resilience strategies in a clear and impactful manner.
- Strong leadership abilities, capable of guiding teams through challenging projects and fostering a culture of resiliency and continuous improvement.
Additional Skills:
Familiarity with Agile development methodologies.
Experience in managing mission-critical systems requiring constant uptime and rapid incident response.
Pay Range: $68.00–$79.00 an hour
Duration: Contract to Hire
Artizen, Inc. is an equal opportunity employer. We are committed to complying with all federal, state, and local employment laws and regulations. It is our intent to maintain a work environment that is free of harassment, discrimination, or retaliation. Artizen will consider qualified applicants with arrest and conviction records for employment, pursuant to state and local Fair Chance Ordinances.