General Summary
We are seeking an experienced RH Advanced Cluster Management (RHACM) Principal Consultant to support a large-scale platform modernization initiative for Truist Bank. This highly autonomous, client-facing role will serve as the lead architect and strategic delivery consultant responsible for validating complex Kubernetes deployment topologies and establishing enterprise standards for multi-cluster observability, governance automation, and GitOps operations.
The ideal candidate brings deep expertise in Red Hat Advanced Cluster Management, ArgoCD, Grafana, Prometheus, and Ansible automation, along with the ability to independently translate high-level business objectives into scalable technical solutions with minimal oversight. This position is fully remote, approximately 6 months with potential for extension.
Responsibilities
Architecture Validation & Strategy
- Lead the design review and validation of RHACM architecture supporting complex multi-cluster deployment topologies and Disaster Recovery (DR) strategies, including Active/Passive configurations.
- Optimize the integration and co-location of infrastructure management tooling and ArgoCD to support a centralized “single pane of glass” operational model.
- Define infrastructure, storage, and performance specifications required for scalable multi-cluster observability and alerting frameworks.
Observability & Performance Management
- Partner with SRE, Platform Engineering, and business stakeholders to gather requirements and develop custom Grafana dashboards focused on capacity planning, network visibility, workload scaling, and operational health.
- Design and implement enterprise-grade alerting frameworks using Prometheus and Alertmanager that reduce alert fatigue and provide actionable notifications.
- Leverage Multi-cluster Observability (MCO), HPA, and VPA auto-scalers to identify resource inefficiencies and improve application density and performance optimization.
GitOps & Governance Automation
- Drive the transition of Day-2 operations into ArgoCD-managed GitOps workflows, ensuring cluster configurations are managed as code and automatically remediated when drift occurs.
- Develop and implement ACM Policy Sets to enforce governance, compliance, security, and configuration consistency across multi-cluster environments.
- Integrate ACM policies and Day-2 operational automation into existing Ansible pipelines to support full lifecycle orchestration and infrastructure automation.
Requirements
Technical Requirements
- Deep expertise with Red Hat Advanced Cluster Management (RHACM), including Multi-cluster Observability (MCO), Multi-cluster Hubs, and Spoke cluster management.
- Strong hands-on experience with ArgoCD for GitOps-driven Day-2 operations, automated configuration management, and drift remediation.
- Advanced knowledge of Grafana, Prometheus, and Alertmanager for enterprise observability and alerting solutions.
- Strong proficiency with Ansible and Infrastructure-as-Code (IaC) automation frameworks.
- Experience implementing ACM Policies and Policy Sets to enforce governance, compliance, and security standards across Kubernetes environments.
- Strong understanding of Kubernetes platform operations, scaling strategies, and enterprise infrastructure management.
Professional Qualifications
- Ability to independently drive technical requirements gathering, architecture validation, and implementation planning with minimal oversight.
- Excellent communication and stakeholder management skills with experience interfacing across SRE, Platform Engineering, and leadership teams.
- Proven ability to operate effectively in highly dynamic enterprise environments.
- U.S.-based resource required.
- Ability to successfully complete required background checks.