Tóm tắt công việc
Qualgo is an R&D center specializing in cybersecurity products and solutions. We are on a mission to build a trusted cyberspace where individuals and businesses can thrive with confidence.
Role Summary:
Join our growing AI team to design, build, and scale production-grade AI solutions across cloud and edge. You'll turn prototypes into reliable, low-latency services - owning data/feature pipelines, training/evaluation, deployment, monitoring, and iteration. You'll optimize performance and cost; implement safety guardrails, observability, and CI/CD/MLOps automations. You'll also enable edge/on-device inference (mobile, desktop, browser, IoT), including model packaging and compression, hardware acceleration (CPU/GPU/NPU), offline/real-time constraints, telemetry, and OTA updates. Partner with
data scientists, engineers, and product to ship user-facing features quickly and safely in a fast-paced, collaborative environment.
Key Responsibilities:
Design, deploy, and optimize LLM-based services, with models running in a self-hosted setup on cloud infrastructure.
Build and maintain a centralized LLM Gateway to manage multi-model access and routing
Implement and evolve Agent-to-Agent (A2A) communication and the Model Context Protocol (MCP) for agent collaboration.
Design and integrate a powerful Agent Memory System, including a dynamic knowledge base and contextual memory to empower intelligent behavior.
Apply model optimization techniques to improve inference efficiency and cost-effectiveness.
Develop and operate MLOps pipelines for model lifecycle management.
Ensure system scalability, reliability, and performance across diverse workloads.
Collaborate across teams to bring intelligent agents into real-world applications.
Perform the duties and tasks assigned by your direct report or as otherwise instructed by the Company.
Competitive salary and benefits package.
Opportunity to work on a product that impacts millions of users.
A dynamic and supportive work environment.
Premium health insurance for you and your family.
Professional growth and development opportunities.
Annual leave 12 - 14 days per year + 1 Birthday Leave + 1 X'Mas.
Performance review: once per year.
Internal training/sharing and professional Training courses.
Team building, company trip, year end party, monthly activities.
Devices: Macbook and screen (If needed).
Free tea and coffee.
Comfortable working area.
Working hour: 9 am - 6pm from Monday to Friday.
Education: Bachelor's degree/ Master's degree or Ph.D. in Computer Science, Artificial Intelligence, Machine Learning,
Electrical Engineering, or a related field.
Minimum 3 years of experience for Middle-level roles, or 5 years for Senior-level roles in AI/ML/DL engineering or similar positions.
Strong programming skills in Python and experience with containerized and cloud-native environment.
Solid understanding of AI/ML/DL model deployment, including serving, optimization, and context-aware design
Experience in building systems with agent orchestration, memory management, and structured communication protocols
Familiarity with retrieval-augmented generation, semantic memory, and message-driven workflows
Hands-on experience working with GPU (NVIDIA) or TPU environments, including model quantization and other performance optimization techniques.
Practical experience with MLOps pipelines for training, versioning, and deploying machine learning models
Bonus: experience with agent-based architecture, LLM streaming patterns, Golang, and a strong foundation in mathematics, statistics, or linear algebra
Fluency in English is a plus.