Databricks Certified Machine Learning Professional Exam Syllabus

Use this quick start guide to collect all the information about Databricks Machine Learning Professional Certification exam. This study guide provides a list of objectives and resources that will help you prepare for items on the Databricks Certified Machine Learning Professional exam. The Sample Questions will help you identify the type and difficulty level of the questions and the Practice Exams will make you familiar with the format and environment of an exam. You should refer this guide carefully before attempting your actual Databricks Certified Machine Learning Professional certification exam.

The Databricks Machine Learning Professional certification is mainly targeted to those candidates who want to build their career in ML Engineer domain. The Databricks Certified Machine Learning Professional exam verifies that the candidate possesses the fundamental knowledge and proven skills in the area of Databricks Machine Learning Professional.

Databricks Machine Learning Professional Exam Summary:

Exam Name	Databricks Certified Machine Learning Professional
Exam Code	Machine Learning Professional
Exam Price	$200 (USD)
Duration	120 mins
Number of Questions	60
Passing Score	70%
Books / Training	Machine Learning at Scale Advanced Machine Learning Operations
Schedule Exam	Databricks Webassesor
Sample Questions	Databricks Machine Learning Professional Sample Questions
Practice Exam	Databricks Machine Learning Professional Certification Practice Exam

Databricks Machine Learning Professional Exam Syllabus Topics:

Topic	Details	Weights
Model Development	Using Spark ML Identify when SparkML is recommended based on the data, model, and use case requirements. Construct an ML pipeline using SparkML. Apply the appropriate estimator and/or transformer given a use case. Tune a SparkML model using MLlib. Evaluate a SparkML model. Score a Spark ML model for a batch or streaming use case. Select SparkML model or single node model for an inference based on type: batch, real-time, streaming. Scaling and Tuning Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs. Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow. Perform distributed hyperparameter tuning using Ray. Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments. Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training. Compare Ray and Spark for distributing ML training workloads Use the Pandas Function API to parallelize group-specific model training and perform inference Advanced MLflow Usage Utilize nested runs using MLflow for tracking complex experiments. Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows. Create custom model objects using real-time feature engineering. Advanced Feature Store Concepts Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference. Build automated pipelines for feature computation using the FeatureEngineering Client Configure online tables for low-latency applications using Databricks SDK. Design scalable solutions for ingesting and processing streaming data to generate features in real time. Develop on-demand features using feature serving for consistent use across training and production environments.	44%
MLOps	Model Lifecycle Management Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy. Map Databricks features to activities of the model lifecycle management process. Validation Testing Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs. Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.). Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference. Compare the benefits and challenges of approaches for organizing functions and unit tests. Environment Architectures Design and implement scalable Databricks environments for machine learning projects using best practices. Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models. Automated Retraining Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts. Develop a strategy for selecting top-performing models during automated retraining. Drift Detection and Lakehouse Monitoring Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes. Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why. Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring. Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc. Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds. Detect data drift by comparing current data distributions to a known baseline or between successive time windows. Evaluate model performance trends over time using an inference table. Define custom metrics in Lakehouse Monitoring metrics tables. Evaluate metrics based on different data granularities and feature slicing. Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage.	44%
Model Deployment	Deployment Strategies Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-traffic applications. Implement a model rollout strategy using Databricks Model Serving. Custom Model Serving Register a custom PyFunc model and log custom artifacts in Unity Catalog. Query custom models via REST API or MLflow Deployments SDK. Deploy custom model objects using MLflow deployments SDK, REST API or user interface.	12%

Topic

Details

Weights

Model Development

Using Spark ML

Identify when SparkML is recommended based on the data, model, and use case requirements.
Construct an ML pipeline using SparkML.
Apply the appropriate estimator and/or transformer given a use case.
Tune a SparkML model using MLlib.
Evaluate a SparkML model.
Score a Spark ML model for a batch or streaming use case.
Select SparkML model or single node model for an inference based on type: batch, real-time, streaming.

Scaling and Tuning

Scale distributed training pipelines using SparkML and pandas Function APIs/UDFs.
Perform distributed hyperparameter tuning using Optuna and integrate it with MLflow.
Perform distributed hyperparameter tuning using Ray.
Evaluate the trade-offs between vertical and horizontal scaling for machine learning workloads in Databricks environments.
Evaluate and select appropriate parallelization (model parallelism, data parallelism) strategies for large-scale ML training.
Compare Ray and Spark for distributing ML training workloads
Use the Pandas Function API to parallelize group-specific model training and perform inference

Advanced MLflow Usage

Utilize nested runs using MLflow for tracking complex experiments.
Log custom metrics, parameters, and artifacts programmatically in MLflow to track advanced experimentation workflows.
Create custom model objects using real-time feature engineering.

Advanced Feature Store Concepts

Ensure point-in-time correctness in feature lookups to prevent data leakage during model training and inference.
Build automated pipelines for feature computation using the FeatureEngineering Client
Configure online tables for low-latency applications using Databricks SDK.
Design scalable solutions for ingesting and processing streaming data to generate features in real time.
Develop on-demand features using feature serving for consistent use across training and production environments.

44%

MLOps

Model Lifecycle Management

Describe and implement the architecture components of model lifecycle pipelines used to manage environment transitions in the deploy code strategy.
Map Databricks features to activities of the model lifecycle management process.

Validation Testing

Implement unit tests for individual functions in Databricks notebooks to ensure they produce expected outputs when given specific inputs.
Identify types of testing performed (unit and integration) in various environment stages (dev, test, prod, etc.).
Design an integration test for machine learning systems that incorporates common pipelines: feature engineering, training, evaluation, deployment, and inference.
Compare the benefits and challenges of approaches for organizing functions and unit tests.

Environment Architectures

Design and implement scalable Databricks environments for machine learning projects using best practices.
Define and configure Databricks ML assets using DABs (Databricks Asset Bundles): model serving endpoints, MLflow experiments, ML registered models.

Automated Retraining

Implement automated retraining workflows that can be triggered by data drift detection or performance degradation alerts.
Develop a strategy for selecting top-performing models during automated retraining.

Drift Detection and Lakehouse Monitoring

Apply any statistical tests from the drift metrics table in Lakehouse Monitoring to detect drift in numerical and categorical data and evaluate the significance of observed changes.
Identify the data table type and Lakehouse Monitoring feature that will resolve a use case need and explain why.
Build a monitor for a snapshot, time series, or inference table using Lakehouse Monitoring.
Identify the key components of common monitoring pipelines: logging, drift detection, model performance, model health, etc.
Design and configure alerting mechanisms to notify stakeholders when drift metrics exceed predefined thresholds.
Detect data drift by comparing current data distributions to a known baseline or between successive time windows.
Evaluate model performance trends over time using an inference table.
Define custom metrics in Lakehouse Monitoring metrics tables.
Evaluate metrics based on different data granularities and feature slicing.
Monitor endpoint health by tracking infrastructure metrics such as latency, request rate, error rate, CPU usage, and memory usage.

44%

Model Deployment

Deployment Strategies

Compare deployment strategies (e.g. blue-green and canary) and evaluate their suitability for high-traffic applications.
Implement a model rollout strategy using Databricks Model Serving.

Custom Model Serving

Register a custom PyFunc model and log custom artifacts in Unity Catalog.
Query custom models via REST API or MLflow Deployments SDK.
Deploy custom model objects using MLflow deployments SDK, REST API or user interface.

12%

To ensure success in Databricks Machine Learning Professional certification exam, we recommend authorized training course, practice test and hands-on experience to prepare for Databricks Certified Machine Learning Professional exam.