Congratulations on starting your big data adventure
Big Data processing is being democratised. Tools such as Azure Databricks, mean you do not need to be a Java expert to be a Big Data Engineer anymore. Databricks has made your life easier! While it is easier, there is still a lot to learn and knowing where to start can be quite daunting.
Too often training courses are academic, teaching theory and not application. We have created an applied Databricks course. We have built this course based on the demands from our customers. It will teach you how to implement different scenarios in Databricks, but most importantly it will tell you why, when to implement and when not to implement.
Advancing Analytics have developed a course based on the needs of our customers. It is designed to take a data professional from Zero to Hero in just 3 days. You will leave this course with all the skills you need to get started on your Big Data Journey. If you are starting a new project and want to know if Databricks is suitable for your problem, then we also offer tailored training around your problem domain.
The course will be delivered by Terry McCann, Data Platform MVP (Microsoft MVP). Terry is recognised for his ability to convert deep technical material in to bite sized understandable chunks.
Remote delivery: £850** per delegate (minimum 5 delegates)
On-site delivery: £1000** per delegate (minimum 8 delegates)
Tailored course: POA
** If you have attended a public course (SQLBits) let me know and we will reduce the price by 33% for one attendee.
(A Full agenda is available upon request)
- Engineering Vs Data Science
Intro to Big Data Processing
- Introduction to Big Data Processing - why we do what we do.
- Introduce you to the skills required
- Introduction to Spark
- Introduce Azure Databricks
Exploring Azure Databricks
- Getting set up
- Exploring Azure Databricks
- The languages (Scala/Python/R/Java)
- Introduction to Scala
- Introduction to PySpark
- PySpark deep dive
- Working with the additional Spark APIs
- Managing Secrets
- Orchestrating Pipelines
- Troubleshooting Query Performance
- Source Controlling Notebooks
- Cluster Sizing
- Installing packages on our cluster / All clusters
- Cloud ETL Patterns
- Design patterns
- Loading Data
- Schema Management
- Transforming Data
- Storing Data
- Managing Lakes
Data Factory Data Flows
- Creating Data Flows
- Execution Comparison
- Introduction to Data Science
- Batch machine learning vs Interactive.
- Python for Machine learning
- Train a model
- Productionise it
- Enrich our existing data - Batch machine learning
- What is ml.lib
- MLLib components
- Creating a regression model in MLLIb
- Creating a classification model in MLlib
- Saving models
- Model deployment scenarios
Databricks Delta Tables
- Introduction to Delta, What is is how it works
- Datalake management
- Problems with Hadoop based lakes
- Creating a Delta Table
- The Transaction Log
- Managing Schema change
- Time travelling
Bring it all back together
- How this all fits in to a wider architecture.
- Projects we have worked on.
- Managing Databricks in production
- Deploying with Azure DevOps
- Getting set up (Building a new instance, getting connected, creating a cluster)
- Creating all the required assets.
- Running a notebooks
- An introduction to the key packages we will be working with.
- Cleaning data
- Transforming data
- Creating a notebook to move data from blob and clean it up.
- Scheduling a notebook to run with Azure Data Factory
- Creating a streaming application
- Creating a Machine learning model
- Deploying a machine learning model
- Reading a stream and enriching our stream
- Databricks Delta
If you would like to arrange a call regarding training please use the link below:
Alternatively, please fill out the following form and someone will get back to you.