Mô tả công việc
Key Responsibilities
1. Data Engineering for Pretraining
• Build and maintain scalable pipelines for text collection, cleaning, deduplication, filtering, and quality scoring.
• Process large-scale Vietnamese and multilingual datasets.
• Implement tokenization workflows, corpus sharding, mixture sampling, and dataset balancing.
• Develop automated dataset validation and quality assurance tools.
2. Model Training & Optimization
• Support distributed training of LLMs using DeepSpeed, Megatron-LM, FSDP, or similar.
• Optimize throughput, memory efficiency, and multi-node GPU performance.
• Run full-scale LLM experiments and troubleshoot training issues.
• Conduct model fine-tuning, instruction tuning, and alignment if needed.
3. Infrastructure & Engineering
• Work with multi-GPU/multi-node clusters using Slurm, Docker/Singularity.
• Maintain experiment tracking pipelines.
• Develop reusable tools for logging, checkpointing, and evaluations.
4. Evaluation & Benchmarking
• Prepare and maintain Vietnamese and multilingual benchmark suites.
• Implement automated evaluation pipelines.
• Analyze results to guide improvements.
Yêu cầu
Minimum Requirements
• Bachelor's/Master's/PhD's degree in CS/AI/ML or related fields.
• Strong Python programming and PyTorch experience.
• Understanding of transformer architectures and tokenization.
• Experience with GPU clusters, Linux, Bash.
• Familiarity with distributed training frameworks.
Preferred Qualifications
• Experience with large-scale datasets.
• Knowledge of Vietnamese NLP.
• Experience with MoE, long-context models, deduplication.
• Open-source contributions.
• Experience with quantization, distillation, compression.
Quyền lợi
Thưởng
Attractive salary & bonus
Chăm sóc sức khoẻ
Premium healthcare
Khác
Opportunity to build next-generation Vietnamese LLMs.
Access to large GPU clusters.
High-growth environment bridging research and product.
Collaboration with strong AI teams.
Competitive compensation.
Thông tin khác
NGÀY ĐĂNG
14/11/2025
CẤP BẬC
Nhân viên
NGÀNH NGHỀ
Giáo Dục > Nghiên Cứu Học Thuật
KỸ NĂNG
Python Programming, PyTorch, Transformer Architectures, Gpu Clusters, Vietnamese Nlp
LĨNH VỰC
Khác
NGÔN NGỮ TRÌNH BÀY HỒ SƠ
Bất kỳ
SỐ NĂM KINH NGHIỆM TỐI THIỂU
3
QUỐC TỊCH
Không giới hạn
Xem thêm
Thông tin chung
Nơi làm việc
- Vincom Center Đồng Khởi, Lê Thánh Tôn, Phường, Bến Nghé, Quận 1, Thành phố Hồ Chí Minh, Việt Nam
- 7th Floor, Technopark Tower, Vinhomes Ocean Park 1, Gia Lam District, Hanoi