Tóm tắt công việc
Lead data engineers at Thoughtworks develop modern data architecture approaches to meet key business objectives and provide end-to-end data solutions. They might spend a few weeks with a new client on a deep technical review or a complete organizational review, helping them to understand the potential that data brings to solve their most pressing problems. On projects, they will be leading the design of technical solutions, or perhaps overseeing a program inception to build a new product. Alongside hands-on coding, they are leading the team to implement the solution.
Job responsibilities
You will lead and manage data engineering projects from inception to completion, including goal-setting, scope definition and ensuring on-time delivery with cross team collaboration.
You will collaborate with stakeholders to understand their strategic objectives and identify opportunities to leverage data and data quality.
You will design, develop and operate modern data architecture approaches to meet key business objectives and provide end-to-end data solutions.
You will be responsible to create, design and develop intricate data processing pipelines, addressing clients' most challenging problems.
You will collaborate with
data scientists to design scalable implementations of their models.
You write clean and iterative code based on TDD and leverage various continuous delivery practices to deploy, support and operate data pipelines.
You will lead and advise clients on how to use different distributed storage and computing technologies from the plethora of options available.
You will develop data models by selecting from a variety of modeling techniques and implementing the chosen data model using the appropriate technology stack.
You will be responsible for data governance, data security and data privacy to support business and compliance requirements.
You will define the strategy for and incorporate data quality into your day-to-day work.
At Thoughtworks, we believe technology can be a powerful force for good. That's why our purpose is to create an extraordinary impact on the world through our culture and technology excellence. This isn't just a slogan, it's the core principle guiding every decision we make.
We achieve this through a relentless pursuit of excellence and a culture built around these five core values:
Be an awesome partner for clients and their ambitious missions: We believe in building strong, collaborative relationships with our clients, helping them achieve their most ambitious goals.
Revolutionize the technology industry: We're constantly pushing boundaries and innovating, aiming to make a lasting impact on the entire tech landscape.
Amplify positive social change and advocate for an equitable tech future: Technology should be a force for good. We actively promote positive social change and fight for an inclusive and equitable tech industry.
Foster a vibrant community of diverse and passionate technologists: Our diverse and passionate teams are our greatest asset. We foster a collaborative and inclusive environment where everyone feels valued and empowered to thrive.
Achieve enduring commercial success and sustained growth: A healthy and sustainable business allows us to continuously invest in our people, technology and social impact initiatives.
This commitment to our purpose is matched by a commitment to our people. As Thoughtworkers, you'll be empowered to focus on what you do best - creating positive change through technology - because we offer a comprehensive benefits package designed to support your well-being and career development.
Here's how we support our team:
Health & wellness: Health insurance for employees and their immediate family, mental health support and annual check-ups.
Work-life balance: Flexible work arrangements on a hybrid work model, maternity/paternity leave, sabbatical leave.
Financial security: Competitive salaries, referral bonuses, and laptop buyback programs.
Professional growth: Training allowances and personal development budgets.
Technology & equipment: Top-tier MacBooks and allowances for work-from-home setups.
Connectivity: Monthly communications allowance to stay connected at home and on the go.
Fun & camaraderie: We organize engaging social activities like running clubs, nature outings, annual events and monthly Town Halls to foster connections and a positive work environment.
Ready to break free from the ordinary and achieve the extraordinary? Explore our open roles!
Technical Skills
Expert-level Databricks skills (SparkSQL, PySpark, Spark DataFrames) and open table formats (Delta Lake, Apache Iceberg).
Deep expertise in columnar storage formats, advanced performance tuning, and optimization strategies (Parquet, ORC, Z-Order, clustering).
Ability to define, architect, and implement modern data architecture patterns (Medallion, data mesh, data product approach).
Mastery of dbt (core/cloud) and advanced SQL for complex analytical transformations, including performance optimization. Expertise in establishing and enforcing data quality, testing, and governance frameworks (Great Expectations, dbt tests, data contracts).
Extensive experience designing and implementing highly scalable streaming and batch data ingestion frameworks (Kafka, Autoloader, APIs, SFTP) and data/file formats (CSV, JSON, YAML).
Experience with event-driven architectures (AWS EventBridge, GCP Pub/Sub, Azure Event Grid).
Architect-level cloud platform expertise (AWS, GCP, or Azure) with deep experience in multiple warehouses (BigQuery, Redshift, Synapse). Knowledge and implementation of security and compliance in cloud data environments (RBAC, data masking, encryption, GDPR/CCPA) and implementation of cost optimization strategies for cloud data platforms.
Leadership in defining and implementing DevOps & infrastructure-as-code strategies (GitLab/GitHub CI/CD, Terraform). Proven ability to design and implement comprehensive observability & monitoring solutions (logging, alerting, pipeline performance tracking).
Leadership in defining and implementing DevOps & infrastructure-as-code strategies (GitLab/GitHub CI/CD, Terraform). Proven ability to design and implement comprehensive observability & monitoring solutions (logging, alerting, pipeline performance tracking).
Expert Python engineering skills, leading best practices in
software engineering (version control, modularity, testing).
Professional Skills
Demonstrated experience in leading large data teams, driving collaboration with business, analysts, and data scientists, and influencing technical direction.
Proven ability in data product design and domain-driven design in data platforms.
Solid experience with machine learning pipelines and MLOps (MLflow, Vertex AI, SageMaker, Azure ML).
Hands-on experience with real-time analytics and low-latency serving layers (e.g., Apache Flink, Materialize, Rockset).
Practical experience with vector databases (Pinecone, Weaviate, ChromaDB) or semantic search in AI workflows.