Use this quick start guide to collect all the information about Databricks Data Engineer Professional Certification exam. This study guide provides a list of objectives and resources that will help you prepare for items on the Databricks Certified Data Engineer Professional exam. The Sample Questions will help you identify the type and difficulty level of the questions and the Practice Exams will make you familiar with the format and environment of an exam. You should refer this guide carefully before attempting your actual Databricks Certified Data Engineer Professional certification exam.
The Databricks Data Engineer Professional certification is mainly targeted to those candidates who want to build their career in Data Engineer domain. The Databricks Certified Data Engineer Professional exam verifies that the candidate possesses the fundamental knowledge and proven skills in the area of Databricks Data Engineer Professional.
Databricks Data Engineer Professional Exam Summary:
| Exam Name | Databricks Certified Data Engineer Professional |
| Exam Code | Data Engineer Professional |
| Exam Price | $200 (USD) |
| Duration | 120 mins |
| Number of Questions | 59 |
| Passing Score | 70% |
| Books / Training | Instructor led Advanced Data Engineering With Databricks |
| Schedule Exam | Databricks Webassesor |
| Sample Questions | Databricks Data Engineer Professional Sample Questions |
| Practice Exam | Databricks Data Engineer Professional Certification Practice Exam |
Databricks Data Engineer Professional Exam Syllabus Topics:
| Topic | Details | Weights |
|---|---|---|
| Developing Code for Data Processing using Python and SQL |
- Using Python and Tools for development
- Building and Testing an ETL pipeline with Lakeflow Declarative Pipelines, SQL, and Apache Spark on the Databricks platform
|
22% |
| Data Ingestion & Acquisition |
- Design and implement data ingestion pipelines to efficiently ingest a variety of data formats including Delta Lake, Parquet, ORC, AVRO, JSON, CSV, XML, Text and Binary from diverse sources such as message buses and cloud storage. - Create an append-only data pipeline capable of handling both batch and streaming data using Delta. |
7% |
| Data Transformation, Cleansing, and Quality |
- Write efficient Spark SQL and PySpark code to apply advanced data transformations, including window functions, joins, and aggregations, to manipulate and analyze large Datasets. - Develop a quarantining process for bad data with Lakeflow Declarative Pipelines or autoloader in classic jobs. |
10% |
| Data Sharing and Federation |
- Demonstrate delta sharing securely between Databricks deployments using Databricks to Databricks Sharing(D2D) or to external platforms using open sharing protocol(D2O). - Configure Lakehouse Federation with proper governance across supported source Systems. - Use Delta Share to share live data from Lakehouse to any computing platform. |
5% |
| Monitoring and Alerting |
- Monitoring
- Alerting
|
10% |
| Cost & Performance Optimisation |
- Understand how / why using Unity Catalog managed tables reduces operation Overhead and maintenance burden. - Understand delta optimization techniques, such as deletion vectors and liquid clustering. - Understand the optimization techniques used by Databricks to ensure the performance of queries on large datasets (data skipping, file pruning, etc). - Apply Change Data Feed (CDF) to address specific limitations of streaming tables and enhance latency. - Use query profile to analyze the query and identify bottlenecks, such as bad data kipping, inefficient types of joins, data shuffling. |
13% |
| Ensuring Data Security and Compliance |
- Applying Data Security mechanisms.
- Ensuring Compliance
|
10% |
| Data Governance |
- Create and add descriptions/metadata about enterprise data to make it more discoverable. - Demonstrate understanding of Unity Catalog permission inheritance model. |
7% |
| Debugging and Deploying |
- Debugging and Troubleshooting
- Deploying CI/CD
|
10% |
| Data Modelling |
- Design and implement scalable data models using Delta Lake to manage large datasets. - Simplify data layout decisions and optimize query performance using Liquid Clustering. - Identify the benefits of using liquid Clustering over Partitioning and ZOrder. - Design Dimensional Models for analytical workloads, ensuring efficient querying and aggregation. |
6% |
To ensure success in Databricks Data Engineer Professional certification exam, we recommend authorized training course, practice test and hands-on experience to prepare for Databricks Certified Data Engineer Professional exam.
