Title: Production System Reliability Engineer
Responsibilities - Internal or External:
• Managing critical incidents and ensuring all key management and business stakeholders are kept up to date
• Ensure Production Management is closely aligned/embedded in the Agile software development process and our code meets production standards
• Developing automated solutions to long standing problems to ensuring minimal downtime and manual effort
• Configuring application monitors using industry standard monitoring tools, as well as developing customized monitoring solutions
• Build extensive business and application knowledge required for supporting client facing applications
• Interface with clients and other technology teams to provide governance and control around the production environment
Qualifications – Internal or External:
You should apply on this requisition if you have, at minimum, the following profile:
• 3 years of application development (Python, HTML, Java Script) or relevant production support experience
• Ability to manage an incident call and coordinate multiple teams towards a common goal of resolving the outage
While this is not a requirement, we are very interested in people who have exposure using the following technologies or subjects:
• Enthusiasm for modern development tools and practices including test-driven development, agile and continuous integration
• Experience managing, deploying and troubleshooting, large scale production environments
• Knowledge of Devops testing and code quality tools
• Strong infrastructure knowledge in Linux / Unix, Windows, Databases (Sybase, DB2 & NoSQL), Storage, Networking and Web Technologies
• Cloud administrator / DevOps knowledge (AZURE preferred)
• Advanced Linux admin level knowledge
• Advanced Unix Shell \ Perl scripting experience
• Advanced SQL query language knowledge