Mô tả công việc
Own and evolve cloud infrastructure across multiple environments using infrastructure-as-code; ensure environments are consistent, auditable, and reproducible.
Design and maintain automated deployment pipelines covering all platform services, from build and test through security scanning to production rollout.
Operate and scale the container orchestration platform; maintain cluster health, resource efficiency, and workload reliability across heterogeneous node types.
Own the networking and routing layer: traffic management, authentication delegation, and secure private connectivity between services.
Lead security engineering across the platform - IAM least-privilege, secrets lifecycle, network hardening, and security gates in CI/CD.
Define and maintain observability standards: structured logging, metrics, alerting, and incident response runbooks for all critical components.
Set infrastructure standards and enforce them through automation and code review; mentor engineers on cloud and operations best practices.
Participate in architecture reviews and contribute infrastructure trade-off analysis to product and engineering decisions.
Yêu cầu
4-6 years of hands-on DevOps, platform engineering, or SRE experience in production cloud environments.
Proven track record managing multi-account, multi-environment AWS infrastructure at production scale.
Expert-level Terraform: modules, workspaces, remote backends, state management, resource import, and lifecycle management.
Strong Ansible: role authoring, Jinja2 templating, inventory management, tag-scoped deployments, and Docker Compose integration.
Deep hands-on experience with: EKS, RDS, ElastiCache, ECR, VPC, NLB, IAM, Secrets Manager, S3, EC2.
AWS SSO / IAM Identity Center: managing multiple account access with named profiles.
Comfortable switching between accounts and regions without cross-contaminating state or credentials.
Production EKS operations: node group management (x86, GPU, Graviton), workload deployments, Helm, and cluster upgrades.
Proficient with Kubernetes CLI tooling at the workload, service, and cluster level; disciplined about environment context verification before any operation.
Experience with NVIDIA GPU workloads or heterogeneous node scheduling is a strong plus.
VPC design: subnets, routing tables, NAT Gateway, NLB with EIP, security groups, and VPN integration.
Reverse-proxy / API gateway experience: Traefik or NGINX with TLS, middleware chains, and OIDC auth delegation.
IAM security: least-privilege policy design, role assumption, and SCPs.
Secrets management: AWS Secrets Manager, HashiCorp Vault, or equivalent - no plaintext credentials in repos.
Familiarity with Zero Trust networking principles.
Identity provider administration (Keycloak or equivalent): realm configuration, OIDC client setup, and forward-auth integration.
OAuth2/OIDC protocol fluency: understands token flows, redirect URIs, and JWKS endpoints well enough to debug silently broken auth.
GitLab CI or GitHub Actions: multi-stage pipelines with environment promotion and security scan gates.
Prometheus + Grafana or equivalent observability stack: metric scraping, dashboards, and alerting rules.
Structured logging pipeline (ELK, Loki, or equivalent).
Team-first mindset- documents everything, communicates blast radius before acting, and treats runbooks as a team asset.
Active ownership- raises blockers early, follows through on on-call issues without being prompted, and closes the loop.
Honesty and transparency- surfaces risks and misconfigurations quickly; never dismisses a "this looks wrong" instinct.
Hard-working and dependable- maintains high operational standards under high-pressure situations without taking shortcuts that create future risk.
Team-first mindset- shares knowledge proactively, unblocks colleagues, and treats code review as a teaching opportunity rather than gatekeeping.
Active ownership- raises concerns early, follows through on commitments, and closes the loop without being reminded.
Honesty and transparency- surfaces bad news quickly, acknowledges mistakes, and documents lessons learned so the team does not repeat them.
Hard-working and dependable- comfortable sustaining high-quality output under delivery pressure without cutting corners on security, testing, or maintainability.
HashiCorp Vault in production: dynamic credentials, PKI secrets engine, HA setup.
GitOps tooling: ArgoCD or Flux for Kubernetes delivery.
Security certifications: AWS Security Specialty, CKS (Certified Kubernetes Security Specialist), or OSCP.
Experience with GPU-accelerated workloads and NVIDIA device plugin for Kubernetes.
Compliance framework exposure: SOC 2, ISO 27001, or GDPR operational controls.
Quyền lợi
Lương thỏa thuận theo năng lực và kinh nghiệm.
Thưởng hiệu suất và thưởng theo kết quả kinh doanh.
Được tham gia đầy đủ bảo hiểm xã hội, bảo hiểm y tế, bảo hiểm thất nghiệp theo quy định.
Bảo hiểm sức khỏe bổ sung.
Được đào tạo nội bộ và đào tạo nâng cao chuyên môn.
Tham gia các hoạt động team building và du lịch công ty.
Cơ hội thăng tiến rõ ràng.
Thông tin chung
- Thu nhập: 34 - 68 triệu VNĐ
Nơi làm việc
- 105 Lê Lợi, Đà Nẵng
- (Trước sáp nhập: Hải Châu, Đà Nẵng | Sau sáp nhập: Hải Châu, Đà Nẵng)
- 1. 105 Lê Lợi,Đà Nẵng
- (Trước sáp nhập: Hải Châu, Đà Nẵng | Sau sáp nhập: Hải Châu, Đà Nẵng)