Advancing Analytics
Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

Announcing my session at #SQLBits - Azure Databricks

databricks.png

Simon Whiteley and I will be back at #SQLBits 2019 talking about hashtag#DataEngineering and #DataScience in Databricks. We will look at #ApacheSpark #Python #Engineering & #MachineLearning in this full day training day.

Have you looked at Azure DataBricks yet? No! Then you need to. Why you ask, there are many reasons.  The number 1, knowing how to use Apache Spark will earn you more money. It is that simple. Data Engineers and Data Scientists who know Apace Spark are in-demand! This workshop is designed to introduce you to the skills required to do both.

In the morning we will introduce Azure DataBricks then discuss how to develop in-memory elastic scale data engineering pipelines. We will talk about shaping and cleaning data, the languages, notebooks, ways of working, design patterns and how to get the best performance. You will build an engineering pipeline with Python (Or possibly some other stuff we are not allowed to tell you about yet). The Engineering element will be delivered by UK MVP Simon Whiteley. Simon has been deploying engineering projects with Azure DataBricks since it was announced. He has real world experience in multiple environments.

Then we will shift gears, we will take the data we moved and cleansed and apply distributed machine learning at scale. We will train a model and productionise it. We will then enrich our data with our newly predicted values. The Data Science element will be led by UK MVP Terry McCann. Terry holds an MSc in Data Science and has been working with Apache Spark for the last 5 years. He is dedicated to applying engineering practices to data science to make model development, training and scoring as easy an as automated as possible

By the end of the day, you will understand how Azure Databricks supports both data engineering and data science, levering Apace Spark to deliver blisteringly fast data pipelines and distributed machine learning models. Bring your laptop as this will be hands on.