1. Must have:
• Bachelor's or Master's degree in Computer Science,
Software Engineering, Information Technology, or a related technical field
• English is required
• Have 5+ years of experience as a Data Engineer or Software Engineer
• Have experience in Cloud (AWS/Azure/GCP)
• Extremely proficient in at least 1 programming language (Python/Scala/Java)
• Strong experience in systems architecture - particularly in complex, scalable, and fault tolerant distributed systems
• Good at multi-threading, atomic operations, computation framework: Spark (DataFrame, SQL, ...), distributed storage, distributed computing
• Understand designs of resilience, fault-tolerance, high availability, and high scalability, ...
• Tools: CI/CD, Gitlab, ...
• Good at communication & team working
• Being open-minded, willing to learn new things
2. Nice to have:
• Experience with Databricks (Delta Lake, Unity Catalog, Delta Live Tables) or similar lakehouse technologies is a strong plus.
• Proven ability in performance tuning and optimization for Big Data workloads (Spark/Flink, partitioning, shuffle strategies, caching).
• Familiarity with modern data transformation frameworks (dbt).
• Knowledgeable in AI and LLM technologies is a plus, including prompt engineering, embeddings, and retrieval-augmented generation (RAG).
• Hands-on experience with vector databases (ChromaDB, Vector Search) and LLMOps practices.