Advancing Analytics
Data Science | AI | DataOps | Engineering
wallpaperRep.png

Advanced Databricks Training

Databricks training

Congratulations on starting your big data adventure

Big Data processing is being democratised. Tools such as Azure Databricks, mean you do not need to be a Java expert to be a Big Data Engineer anymore. Databricks has made your life easier! While it is easier, there is still a lot to learn and knowing where to start can be quite daunting.

Too often training courses are academic, teaching theory and not application. We have created an applied Databricks course. We have built this course based on the demands from our customers. It will teach you how to implement different scenarios in Databricks, but most importantly it will tell you why, when to implement and when not to implement.

Advancing Analytics have developed a course based on the needs of our customers. It is designed to take a data professional from Zero to Hero in just 3 days. You will leave this course with all the skills you need to get started on your Big Data Journey. If you are starting a new project and want to know if Databricks is suitable for your problem, then we also offer tailored training around your problem domain.

The course will be delivered by Terry McCann, Data Platform MVP (Microsoft MVP). Terry is recognised for his ability to convert deep technical material in to bite sized understandable chunks.

Cost

Remote delivery: £850** per delegate (minimum 5 delegates)
On-site delivery: £1000** per delegate (minimum 8 delegates)
Tailored course: POA
** If you have attended a public course (SQLBits) let me know and we will reduce the price by 33% for one attendee.

Certs.png

Very good and understandable explanations of quite complex stuff. A very good kickoff to get started with Databricks. Great combination of theory and real life experiences.
— Previous course attendee

Stats.png

Agenda

(A Full agenda is available upon request)

Introduction

General introduction

  • Engineering Vs Data Science

Intro to Big Data Processing

  • Introduction to Big Data Processing - why we do what we do.
  • Introduce you to the skills required
  • Introduction to Spark
  • Introduce Azure Databricks

Exploring Azure Databricks

  • Getting set up
  • Exploring Azure Databricks
 

The languages

  • The languages (Scala/Python/R/Java)
  • Introduction to Scala
  • Introduction to PySpark
  • PySpark deep dive
  • Working with the additional Spark APIs

Data Engineering

Managing Databricks

  • Managing Secrets
  • Orchestrating Pipelines
  • Troubleshooting Query Performance
  • Source Controlling Notebooks
  • Cluster Sizing
  • Installing packages on our cluster / All clusters

Data Engineering

  • Cloud ETL Patterns
  • Design patterns
  • Loading Data
  • Schema Management
  • Transforming Data
  • Storing Data
  • Managing Lakes

Data Factory Data Flows

  • Creating Data Flows
  • Execution Comparison

Data Science

Data Science

  • Introduction to Data Science
  • Batch machine learning vs Interactive.
  • Python for Machine learning
  • Train a model
  • Productionise it
  • Enrich our existing data - Batch machine learning

ML.lib

  • What is ml.lib
  • MLLib components
  • Creating a regression model in MLLIb
  • Creating a classification model in MLlib
  • Tuning
  • Saving models
  • Model deployment scenarios

Databricks Delta

Databricks Delta Tables

  • Introduction to Delta, What is is how it works
  • Datalake management
  • Problems with Hadoop based lakes
  • Creating a Delta Table
  • The Transaction Log
  • Managing Schema change
  • Time travelling

Bring it all back together

  • How this all fits in to a wider architecture.
  • Projects we have worked on.
  • Managing Databricks in production
  • Deploying with Azure DevOps

Labs

  • Getting set up (Building a new instance, getting connected, creating a cluster)
  • Creating all the required assets.
  • Running a notebooks
  • An introduction to the key packages we will be working with.
  • Cleaning data
  • Transforming data
  • Creating a notebook to move data from blob and clean it up.
  • Scheduling a notebook to run with Azure Data Factory
  • Creating a streaming application
  • Creating a Machine learning model
  • Deploying a machine learning model
  • Reading a stream and enriching our stream
  • Databricks Delta

If you would like to arrange a call regarding training please use the link below:

Alternatively, please fill out the following form and someone will get back to you.

Name *
Name