Advancing Analytics
Data Science | AI | DataOps | Engineering
backgroundRedl.png

DataOps: Deploying models faster

Applying DevOps to Data Science

 
 

Never do something more than once…

I have a core ethos that I take to every project I work on. “Never do anything more than once”. It is because of that, that I have spent much of career working either with, or developing automation tools. Each tools has one simple goal, to accelerate and simplify my development processes. My first role was data entry (let me know in the comments if yours was too), I quickly automated that and was able to add more value, I gained a set of new tasks, automated those and turned my temping job in to a full time career (That I love!).

When working as a software developer, I became familiar with DevOps and how it accelerated the software industry. DevOps allows software teams to ship code faster, more reliably and increase developer satisfaction (seriously! It really does make developers happier). I have been applying the core principles of DevOps to all my recent projects, with great success.

In 2016 I started an MSc in Data Science. The course covered a lot of the techniques required for data science, however it did not cover how to deploy a model in to production (this is not a problem with the course, this is a problem with the industry). I started researching deployment techniques. I read a lot of journals, blogs and books which described the development process in the minutes of detail, but stopping at deployment. No one talks about deployment. Why? Because it is hard!

I have lead teams of software engineers and data scientists. When I am recruiting I ask 2 simple questions. “Tell me about a project you were happy/passionate about”. We can all do this, I have heard some fantastic stories, there are a lot of talented developers out there. I then follow it up with “How did you deploy that?”. This is where the interviews change, especially for data scientists. Most data scientist have never deployed a model. To me this is crazy. How can you appreciate the value of a model if it is not deployed? This is a real problem in the industry.

When completing my MSc I used this subject as my thesis topic and with that, a professional curiosity was born. It has become my personal mission to make developer and data scientists better at deploying their code and models and I want to start with you. There is a lot of content to get through, we will take it one step at a time. The culmination of this article series is a design pattern which allows a developer to commit their machine learning models in to version control and have a deployed machine learning model in production.

To do this we will highlight various tools, languages and techniques along the way. All code examples are hosted in my GitHub account (link at the top). I am trying to talk more and more about this subject at customers and conferences. If you would like to discuss this in more detail, please get in touch.

 

a dataops approach to machine learning…

Research Blog series contents.

  1. Setting the scene (this page)

  2. An introduction to DevOps

  3. DevOps in detail

  4. Data Science, The process, problems and productionisation. 

  5. How DevOps is currently being implemented in the ML industry.

  6. [Book Review] Machine Learning Logistics (Ted Dunning and Ellen Friedman)

  7. How to apply DevOps to Data Science.

Technical blog series contents

  1. An Introduction to Visual studio Team Services (VSTS)

  2. An Introduction to Git

  3. An introduction to Docker, Docker-compose

  4. An introduction Azure Container Registry

  5. An introduction to Kubernetes

  6. An Introduction to Helm

  7. A design pattern for Rendezvous