Job description: We are actively seeking a highly skilled Senior Data Engineer to join our client, a globally recognized security software company.This role is pivotal in the design and implementation of sophisticated feature engineering and ETL pipelines, directly fueling their advanced machine learning initiatives. The ideal candidate will demonstrate practical experience with Databricks, understanding of the medallion architecture, and experienced in supporting ML activities.Primary Responsibilities:
Advanced Feature Engineering:
Develop and maintain robust feature engineering pipelines utilizing Databricks and Apache Spark (PySpark), optimized for machine learning model performance.
Comprehensive Data Integration:
Orchestrate the integration of diverse data sources, including clickstream data, user behavior patterns, and demographic information, to construct detailed user profiles for complex machine learning applications.
Medallion Architecture Expertise:
Architect and deploy ETL/ELT pipelines that adhere to the bronze, silver, and gold layers of the medallion architecture, ensuring data quality and reliability.
Machine Learning Lifecycle Support:
Construct and manage data pipelines to support machine learning model training, calibration, and deployment, leveraging MLflow for rigorous experiment tracking and performance monitoring.
Performance-Critical Pipeline Design:
Engineer low-latency, production-ready data pipelines to facilitate both real-time and batch machine learning model inference.
CI/CD Implementation:
Apply CI/CD best practices to ensure seamless and automated deployment of data pipelines.
Data Governance and Security:
Enforce strict data governance policies, ensuring compliance with security and regulatory standards, particularly for Personally Identifiable Information (PII), and maintain thorough metadata and master data management.
Collaborative Development:
Foster close collaboration with machine learning scientists, software engineers, and business stakeholders to align data transformation strategies with key business objectives.
Essential Qualifications:
7+ years of professional experience in data engineering, with a minimum of 4+ years specializing in machine learning feature engineering, ETL pipeline development, and data preparation for machine learning.
Extensive experience in managing large-scale data pipelines on Databricks, utilizing Apache Spark, with a deep understanding of the medallion architecture.
Proven expertise in machine learning lifecycle management, with significant experience using MLflow. Advanced proficiency in Apache Spark (PySpark) for large-scale data processing and analytics is required.
Proficiency in Python for data manipulation and SQL for query optimization.
Demonstrated experience in building and deploying data pipelines for real-time and batch machine learning model serving in production environments, and a thorough understanding of CI/CD principles for ETL/ELT pipelines.
Expertise in metadata management
Understanding of data security and compliance, especially with sensitive data like PII.
Expected salary:
Location: Canada
Job date: Wed, 19 Mar 2025 06:14:21 GMT
To help us track our recruitment effort, please indicate in your email/cover letter where (jobsnear.pro) you saw this job posting.Thanks&Good Luck