Use this quick start guide to collect all the information about Databricks Machine Learning Associate Certification exam. This study guide provides a list of objectives and resources that will help you prepare for items on the Databricks Certified Machine Learning Associate exam. The Sample Questions will help you identify the type and difficulty level of the questions and the Practice Exams will make you familiar with the format and environment of an exam. You should refer this guide carefully before attempting your actual Databricks Certified Machine Learning Associate certification exam.
The Databricks Machine Learning Associate certification is mainly targeted to those candidates who want to build their career in ML Engineer domain. The Databricks Certified Machine Learning Associate exam verifies that the candidate possesses the fundamental knowledge and proven skills in the area of Databricks Machine Learning Associate.
Databricks Machine Learning Associate Exam Summary:
Exam Name | Databricks Certified Machine Learning Associate |
Exam Code | Machine Learning Associate |
Exam Price | $200 (USD) |
Duration | 90 mins |
Number of Questions | 48 |
Passing Score | 70% |
Books / Training | Machine Learning with Databricks |
Schedule Exam | Kryterion Webassesor |
Sample Questions | Databricks Machine Learning Associate Sample Questions |
Practice Exam | Databricks Machine Learning Associate Certification Practice Exam |
Databricks Machine Learning Associate Exam Syllabus Topics:
Topic | Details | Weights |
---|---|---|
Databricks Machine Learning |
- Identify the best practices of an MLOps strategy - Identify the advantages of using ML runtimes - Identify how AutoML facilitates model/feature selection - Identify the advantages AutoML brings to the model development process - Identify the benefits of creating feature store tables at the account level in Unity Catalog in Databricks vs at the workspace level - Create a feature store table in Unity Catalog - Write data to a feature store table - Train a model with features from a feature store table. - Score a model using features from a feature store table. - Describe the differences between online and offline feature tables - Identify the best run using the MLflow Client API. - Manually log metrics, artifacts, and models in an MLflow Run. - Identify information available in the MLFlow UI - Register a model using the MLflow Client API in the Unity Catalog registry - Identify benefits of registering models in the Unity Catalog registry over the workspace registry - Identify scenarios where promoting code is preferred over promoting models and vice versa - Set or remove a tag for a model - Promote a challenger model to a champion model using aliases |
29% |
Data Processing |
- Compute summary statistics on a Spark DataFrame using .summary() or dbutils data summaries - Remove outliers from a Spark DataFrame based on standard deviation or IQR - Create visualizations for categorical or continuous features - Compare two categorical or two continuous features using the appropriate method - Compare and contrast imputing missing values with the mean or median or mode value - Impute missing values with the mode, mean, or median value - Use one-hot encoding for categorical features - Identify and explain the model types or data sets for which one-hot encoding is or is not appropriate. - Identify scenarios where log scale transformation is appropriate |
29% |
Model Development |
- Use ML foundations to select the appropriate algorithm for a given model scenario - Identify methods to mitigate data imbalance in training data - Compare estimators and transformers - Develop a training pipeline - Use Hyperopt's fmin operation to tune a model's hyperparameters - Perform random or grid search or Bayesian search as a method for tuning hyperparameters. - Parallelize single node models for hyperparameter tuning - Describe the benefits and downsides of using cross-validation over a train-validation split. - Perform cross-validation as a part of model fitting. - Identify the number of models being trained in conjunction with a grid-search and cross-validation process. - Use common classification metrics: F1, Log Loss, ROC/AUC, etc - Use common regression metrics: RMSE, MAE, R-squared, etc. - Choose the most appropriate metric for a given scenario objective - Identify the need to exponentiate log-transformed variables before calculating evaluation metrics or interpreting predictions - Assess the impact of model complexity and the bias variance tradeoff on model performance |
33% |
Model Deployment |
- Identify the differences and advantages of model serving approaches: batch, realtime, and streaming - Deploy a custom model to a model endpoint - Use pandas to perform batch inference - Identify how streaming inference is performed with Delta Live Tables - Deploy and query a model for realtime inference - Split data between endpoints for realtime interference |
9% |
To ensure success in Databricks Machine Learning Associate certification exam, we recommend authorized training course, practice test and hands-on experience to prepare for Databricks Certified Machine Learning Associate exam.