Tóm tắt công việc
We're looking for a hands-on, forward‐thinking Senior Site Reliability Engineer to elevate the reliability, automation, and scalability of one of our most strategically important domains.
You'll combine strong engineering capability with servant leadership to guide the team, automate production processes, improve resilience, and drive operational excellence. You'll enjoy solving complex operational challenges with code and mentoring others on engineering best practice in a high‐stakes production environment.
What you'll do
Reliability & Resilience Engineering
Design and automate production operational processes, including deployments, monitoring, alerting, and self‐service capabilities.
Relentlessly optimise best practice, balancing between ITIL process rigour and lean principles.
Improve system resilience, incident recovery, observability, and performance.
Deliver resilience and recovery testing, including chaos engineering and performance scenarios.
Balance development speed with reliability targets through well-defined SLOs and engineering standards.
Operational Excellence & Observability
Analyse metrics across OS, platform, and application layers to support tuning, fault diagnosis, audits, and capacity planning.
Oversee the SDLC for reliability-focused features, including code reviews, white-box testing, and maintaining test frameworks.
Change & Incident Management
Participate in automated change delivery, including resilience testing, verification, change control, and user communication.
Ensure operational readiness as workload and use cases scale, optimising both human and technical resources.
Leadership & Production Ownership
Act as
Product Owner delegate and champion for production resilience and scale.
Provide data‐driven assessments and readiness reports to support program‐level go/no‐go decisions for major releases, migrations, and customer cutovers.
Provide technical leadership and mentorship to engineers earlier in their career journey.
Facilitate blameless post‐mortems and drive engineering‐first problem resolution.
THE BENEFITS AND PERKS
1. Generous compensation and benefit package
Attractive salary and benefits
20-day annual leave and 7-day sick leave, etc.
13th month salary and Annual Performance Bonus
Premium healthcare for yourself and family members
Monthly allowance for team activities
Premium welcome kit and frequent appreciation gifts
Extra benefits for long-term employees
2. Exciting career and development opportunities
Large scale products with modern technologies in banking domain
Clear roadmap for career advancement in both technical and leadership pathways
Well-structured learning and development programs (technical and soft skills)
Sponsored certificates in both IT and banking/finance
Premium account on Udemy
English learning with native teachers
Opportunity for traveling & training in Australia
3. Professional and engaging working environment
Hybrid working model and good work-life balance
Well-equipped & modern Agile office with fully stocked pantry
Special programs to improve your physical and mental health
Annual company trip and events
A solid talented team behind you - great people who love what they do
Essential
Strong
software engineering background (Java, DevOps, platform engineering, or automation).
Proficiency with build and automation tools such as Gradle, Jenkins, Ant, Python/Jython, Artifactory, Terraform, SonarQube.
Knowledge of event-driven architectures with experience in Apache Kafka or IBM MQ.
Strong Linux (*nix) and cloud hosting skills (AWS preferred).
Excellent communication skills, with an ability to collaborate across engineering and business stakeholders.
Specialist Skills (Highly Desirable)
Performance Testing: Ability to measure and validate response time, throughput, and reliability under expected concurrency levels.
Resilience Engineering: Assess whether current patterns withstand unexpected scenarios and ensure services recover automatically.
Stress Testing: Identify breaking points and understand system behaviour during and after failure conditions.
Reliability Engineering: Validate that critical operations (e.g., key rotations, scaling events) occur with zero customer impact and maintain stability under load.
Observability: Skill in proving production reliability, resilience, and performance using metrics, logs, traces, dashboards, and SLOs.
Why join us?
Work on systems central to Australia's financial ecosystem.
Influence engineering strategy and reliability practices.
High-impact technical leadership role with strong career growth.
Culture built around collaboration, learning, and blameless practice.