Vị trí công việc này hiện tại đã hết hạn nộp hồ sơ, bạn có thể tham khảo thêm một số công việc liên quan phía dưới
Mô tả công việc
Tóm tắt công việc
Optimizely is focused on unlocking digital potential. We are the recognized category leader in Digital Experience Platform (DXP) and created the category for A/B Testing and experimentation software. We have incredible customers - isn't that one of the most important aspects of looking for your next job? Optimizely has over 9,000 brands from global organizations such as Visa, Sky, Yamaha, Wall Street Journal to tech innovators like Atlassian, DocuSign, Fitbit, and Zillow. Not only are we financially sound and growing but we have unicorn status: Exceeded $300M in revenue in 2020, is profitable already, and has all strategic options ahead of itself. Optimizely continues to invest and addresses a market opportunity north of $30 billion, providing significant personal career growth opportunities. We are an inclusive culture with a global team of 1500+ people across the US, Europe, Australia, and Vietnam. We blend European and American business culture with emphasis on teamwork, inclusion, and moving fast. People make the difference! If you are looking to work on the next generation of digital technologies in a fast-paced, hyper-growth environment, apply! We're just getting started...
We are looking for a Senior Site Reliability Engineer to help build and scale our CloudOps capabilities. You will be responsible for designing, implementing, and operating critical infrastructure and platform services while collaborating closely with engineering, support, and product teams to improve the reliability, scalability, and performance of our systems. This is a hands-on technical role where you will be instrumental in shaping the SRE culture, driving automation, and ensuring high availability across all services.
Responsibilities:
Champion a Site Reliability Engineering culture across the organization by sharing best practices, tools, documentation, and code.
Identify and automate manual operational tasks using scripting, infrastructure-as-code, and CI/CD pipelines.
Build and maintain observability (monitoring, logging, tracing) for all production systems to ensure reliability, availability, and performance.
Proactively monitor alerts across all platforms and coordinate with SRE, Operations, Engineering, and Support teams to ensure quick detection and resolution of incidents-minimizing MTTA/MTTR.
Lead and manage on-call rotations, driving a blameless incident management and postmortem culture.
Collaborate with development teams to define and implement SLOs, SLIs, and error budgets.
Ensure uptime SLAs are met through robust automation, testing, monitoring, and operational best practices.
Create and maintain runbooks, playbooks, and system documentation to ensure operational readiness and knowledge sharing.
We are certified the Great Place to Work 2024-2025.
Hackdays for self-studying and researching any IT-related subjects.
5 working days/week with flexible working time and no overtime.
Annual unforgettable company outing.
International, professional, creative working environment and talented teams.
Onsite opportunities in Europe and US.
Common cultural-sportive- art Clubs and activities, sponsored and/or supported by the Company (Ex: Football, GYM, Swimming, Guitar, English...).
Powerful workstation: Core i7-9700, 16-32 GB RAM, 02 x QHD 2560x1440 monitors (2K resolution).
100% official salary during the probation period, 13th month salary, annual salary raise.
Up to 03 extra paid-leave days per year.
Social, Health and Unemployed Insurance are based on 100%.
Gross salary and fully paid by Company.
Extra bonus at $ 60 per special occasions (Birthday, Labor Day, National Day, Solar New year, Lunar New Year).
Lunch allowance at $30 per month.
Baby allowance for a child under 03 years old is $ 12 per month.
AON Premium Healthcare Insurance package for employees and their children up to 18 years old.
Daily various foods, drink, and seasonal fresh fruits.
Yêu cầu
About You:
Strong experience in Linux Systems Administration in cloud or virtualized environments.
Proficiency in infrastructure-as-code tools such as Terraform.
Hands-on experience with configuration management tools like Ansible or SaltStack.
Skilled in scripting and automation using Python and Bash.
Experience deploying and maintaining services in public cloud environments (Azure, AWS, or GCP).
Solid understanding of observability tooling, especially Datadog, ELK Stack (Elasticsearch, Logstash, Kibana), or similar.
Experience building and maintaining CI/CD pipelines (e.g., GitHub Actions, Azure DevOps, Octopus).
Familiarity with Kubernetes and Docker; production experience is a strong plus.
Experience operating and scaling distributed systems across multiple regions.
Strong communication and collaboration skills; comfortable working across time zones.
Passion for learning, continuous improvement, and a strong sense of ownership.
Fluent in English, both written and spoken.
Thông tin khác
DevOps
PostgreSQL
MS SQL
Linux
Python
Github
Distributed Systems
Elasticsearch
Docker
Observability
Kibana
MS Azure
Ansible
Logstash
AWS
Kubernetes
Bash
DataDog
SaltStack
Salt
GCP
Elastic Stack
Terraform
Octopus Deploy
Azure DevOps
ELK
CI/CD
Thông tin chung
- Ngày hết hạn: 20/08/2025
- Thu nhập: Thỏa Thuận