Job Title: Site Reliability Engineer (SRE)
Location: Dallas, Austin, or Omaha (Onsite)
Compensation: $55-$60 per hour (W2/Visa)
Job Summary:
Charles Schwab is seeking a Site Reliability Engineer (SRE) to support our industry-leading Order Management System (OMS) within a highly performant and scalable compute environment. In this role, you will be responsible for responding to alerts and escalations, managing incidents, maintaining technical documentation, and independently configuring complex systems. Your ability to leverage Java-based tools for automation, software development, and delivery will be critical to your success.
You will work closely with Development, QA, and SRE teams to optimize environments and tooling, ensuring operational excellence. This role requires on-site presence at a Schwab headquarters location and does not provide immigration sponsorship.
Key Responsibilities:
- Provide 24/7 on-call support on a rotating schedule for designated systems.
- Troubleshoot, research, and resolve system defects, performance issues, and inconsistencies.
- Respond promptly to system monitors, alerts, escalations, and outages.
- Analyze issues to recommend improvements and prevent recurrence.
- Work collaboratively to resolve complex application issues.
- Monitor server health and track system performance.
- Draft, implement, and maintain process documentation for knowledge transfer.
- Communicate service disruptions and resolutions to management and stakeholders.
- Identify and implement opportunities for improved processes and automation.
- Maintain detailed documentation for system configurations, troubleshooting, and best practices.
Qualifications & Requirements:
-
Education: Bachelor's degree in Computer Science, Engineering, or equivalent experience.
-
Experience: 5+ years supporting Java-based applications in a RHEL (Linux) environment.
- Strong analytical, organizational, and problem-solving skills.
-
Cloud Experience: 2+ years supporting Google Cloud Platform (GCP) or similar cloud platforms.
-
Java Expertise: Experience diagnosing JVM issues, including thread dumps, garbage collection, and memory management (Java 17+ preferred).
-
Version Control: 3+ years managing Git repositories (Bitbucket, Git, Artifactory).
-
Automation & Configuration Management: Experience with tools such as Puppet, Foreman, Terraform, Salt, Ansible.
-
Scripting: Proficiency in Shell, Perl, Python, Ruby, JSON, XML, YAML.
-
Linux Administration: Managing RHEL 6, 7, & 8 servers, optimizing for application performance.
- Familiarity with Agile SDLC and ability to contribute to project planning.
- Experience working in financial services or similar regulated industries preferred.
- Availability for weekend or after-hours work as needed.
Preferred Skills:
- Strong communication skills with the ability to work across multiple teams.
- Ability to work effectively both independently and in a team environment.
- Detail-oriented with a high sense of urgency.
- Experience with incident response and service reliability best practices.