We are looking for a strong technologist to join the new and growing Chaos and Resiliency practice at our company. The role will involve close working with software developers and infrastructure engineering teams to build software that will for the base of chaos and resiliency testing at the Firm. This role is in the Reliability and Performance Engineering (RPE) group supporting the Operations Technology department. While the domain is Financial Services, we encourage candidates with the necessary technical and interpersonal skills from other domains to apply.
Role and Responsibilities
• Develop software to automate chaos and resiliency test cases that simulate failures in a system that performs financial data processing.
• Build a re-usable software library using a combination of open-source software and in-house technology that will be made available to other software developers at the Firm.
• Analyze new system architectures to identify single points of failure and other areas that may present a resiliency deficiency. Design tests to
• Execute the tests as part of nightly builds and Game Days.
Chaos / Reliability Engineering skills required:
• Software programming experience in distributed languages E.g. Java, Python, C++, Perl Etc.
• Hands on software development or test automation coding in on-premises and public Cloud environments; preferably Azure.
• Ability to script in Linux with exposure to administration or engineering.
• Experience with DevOps tooling such as Pipelines, Cloudify, Terraform is desirable.
• Experience with writing programs that interact with relational distributed databases like IBM DB2, Oracle SQL SQL Server.
• Willingness to learn new technologies and the ability to work in a dynamic and rapidly changing environment
Additional skills:
• Chaos / Resiliency testing experience and stochastic modelling. Experience with industry leading chaos / resiliency technologies a plus.
• Deep knowledge of distributed systems architecture e.g. microservices, REst, message queues, distributed transactions, distributed functions, fault tolerant systems.
• Ability to code using modern testing frameworks e.g. Junit, BDD, Robot.
• Understanding of the test automation development lifecycle and test measurement.
• Knowledge of container technologies such as Docker, Kubernetes Etc.
• Excellent communication skills both verbal and written.
Experience
• Minimum of 10 years of experience working for a software or Telco global firm.
• Bachelors in Computer Science or related field.