Advancing Analytics
Data Science | AI | DataOps | Engineering


Data Science & Data Engineering blogs


Thanks for reading. Here you will find a huge range of information in text, audio and video on topics such as Data Science, Data Engineering, Machine Learning Engineering, DataOps and much more. The show notes for “Data Science in Production” are also collated here.

DevOps in more detail

DevOps in detail banner

In the last blog An Introduction to DevOps we looked at the basics on what DevOps is. We only really skimmed the surface. I want to dig in to a bit more detail, which will make the discussion about Data Science and DevOps a little easier. I want to start by recommending two great books. You will see references to pages and quotations through out this series. All the references are list here: DevOps for Data Science. The two books I recommend are The DevOps Handbook and the Phoenix Project. Both books are fantastic and approach the subject from different angles.

DevOps Books


The term DevOps was coined by Patrick Debois in 2009 (Kim et al, 2016. Pp 5) at an Agile conference. There are many similarities between Agile and DevOps. Agile, as set out in the Agile Manifesto, is a series of twelve commandments which suggest way to improve who development teams work. Click the link to take a look. These commandments were set out by 17 of the software industries most influential developers (Kim et al, 2016. Pp 5).

The third point in the Agile Manifesto is “Deliver working software frequently, from a couple of months to a couple of weeks, with a preference to the shorter timescale” (Agile Manifesto, 2001). The idea of more frequent deployments is at the heart of DevOps and is proven by the results in Puppet’s 2017 survey (Puppet, 2017). Let look in a bit more detail at how each section of DevOps is implemented.

Humble, Muller and Puppet, agree that a DevOps culture and shared goal is vital. To really achieve the benefits of DevOps, you need buy-in from the business and wider stakeholders. For DevOps to be successful, the processes need to be followed by the development team and accepted by the whole business, that culture needs to be in-place from the Chief Technology Officer right down the junior developers. Many developers take DevOps at face-value and assume it is just a set of tools, DevOps is a culture and way of working, tools simply enable that culture, however Davis and Daniels argue that technology is part of that culture (Davis & Daniels 2016 pp9). They work together, but without each other, neither works.

DevOps is doesn’t just speed up deployments and reduce error rates, it also has the side effect of improving job satisfaction for software developers (Puppet, 2014). In Puppet’s first DevOps survey in 2014 they noted that teams who implemented DevOps found that job satisfaction increased when DevOps practices were used. Puppet also noted that as “IT performance increases, profitability, market share and productivity also increase” (Puppet, 2014). DevOps can improve getting code in to production faster, but it also has many benefits to the business and the employees who implement it. This is quite profound!

Source Control
Part of the culture of DevOps is collaborating and sharing development efforts. Collaboration is important and to enable collaboration all developers need to know that while they are working together, their individual changes are completed in isolation they do not impact other developers.

To do this a developer needs a robust source control system and needs to understand how that source control system works – It might sound obvious, however I sent more time fixing Git merges my first time using it than i did developing code! Still don’t believe me? Here is a great site for when you do something wrong OhShitGit!.

Source control paramount to the successful delivery of DevOps. Continuous integration and delivery, infrastructure and configuration as code are all reliant on having a source control system in place. The lowest level in the DevOps stack is source control and all other elements grow from there. Source control systems are varied in their implementation, there are a few common source control applications which facilitate DevOps. Historically source control was managed by a single central server. An example of this pattern are tools such as Subversion SVN. SVN features a central server where code is committed. While this pattern worked for many years, as software development steered more towards an agile methodology, central source control tools were not capable of delivering agile development.

The solution was a distributed source control system. They key application to come from this is Git. Git operates by having a central server, however also having spokes which allow developers to work in isolation from the master repository. We will look at the basics of git in another blog. In particular we will look at setting up Visual Studio Team services (VSTS) and set up our local repository.

With any source control solution, it is important to have a robust branching strategy. Branching relates to how a team of developers will divide the structure of code in their source control to minimise developers making changes which might conflict with another developer. There are many different design patterns for branching, Milanesio, in his article Git Patterns and anti-patterns, lists 16 different options based on different team sizes and scenarios (Milanesio, 2013). Branching ties in with continuous integration as described below. Branching approaches is as highly disputed amongst developers as whether to use tabs or spaces. However, you want to branch, just do what works for your team.

Continuous integration
Continuous integration is the process of running automated integration tests against your codebase to ensure that fewer bugs and issues make it in to production (Kim et al, 2015. Pp 126). Developers are good at building software, where many fall-down is in the creation of tests and their approach to testing in general. Code is developed in isolation and works on the machine it was built on. However once that code is deployed and we realise it is not fit-for-purpose or it does not integrate with the rest of our code, this change needs to be rolled back and the deployment rejected. If the development team does not have source control, this will be difficult, if not impossible to achieve. As we saw in the previous blog that could have catastrophic financial issues.

Continuous integration aims to address this problem though automated testing and automated builds of a solution. When performed correctly, a team has confidence that this code is in a shippable state (Kim et al, 2015. Pp127). On code is checked in to source control, the source control system can be configured to trigger a build of the solution, automated unit tests and validated this means that the time to production decreases.

As developers, we don’t need to worry about full system-integration-tests, we can focus on the feature level testing and ensure that continuous integration will handle this. Where there are exceptions, this is passed back to the development team and work can begin to fix and integration problems prior to a production release. Lean production manufacturing was designed to support quicker release cycles, through continuous integration we begin to see how DevOps is inspired by this process.

One possible way to implement continuous integration, is by creating a full test of the end-to-end solution. By doing this we can ensure that all dependencies of the project are known. Code can be packaged and deployed in a repeatable way. There are multiple types of tests we might want to conduct during continuous integration. Unit tests, typically at the level of an item, Acceptance tests, test the application as a whole and integration tests, does the application interact as expected with other applications? (Kim et al, 2015. Pp 131).

Continuous Delivery / Deployment
Once code has been built and tested, it can be deployed in a similar fashion to continuous integration. A pipeline process will package code as required and deploy it to a development instance for user-acceptance-testing. This process is tagged on to the end of a continuous integration step to promote code in to production once tested. DevOps is facilitated by the combinations of the source control, continuous integration and continuous deployment.

As code is checked in, it is built, tested and deployed upon the check-in of code. Multiple developers can check in code multiple times a day. Continuous integration ensure that the changes are isolated and there are no issues, continuous deployment deploys the clean code. With a code as simple as “git push” the entire software stack is built, compiled, tested and deployed. This is where DevOps sees most of its power. With the activities described above, a developer can now write code to achieve a story and then submit it to source control, from there is tested and deployed to a production-ready environment. At no point in the scenario have they had to involve operations to perform a lengthy deployment process, this has been automated.

You might not have developed a continuous deployment pipeline, but you can almost you have experienced one. A good example of this is Spotify. If you’re a used of the Spotify desktop app then you will have seen it frequently wants to be restarted to apply new changes. Does this happen monthly or daily? It happens daily. Imagine trying to deploy code manually daily. It is a lot of work!

Infrastructure as code
To enable continuous deployment a structure for code which describes how each required artefact is to be built might be required. In general we don’t want to have to configure servers manually, all configuration and set up should be documented as code.

If we have metadata which describes how the infrastructure is to be built then we should be able to make the process of deployment in to additional environments simple and repeatable. No longer do you need to worry about a developer exclaiming that “it worked on my machine”. Infrastructure as code, should ensure it works on all machines and in a consistent manner. If that is a virtual machine the infrastructure as code should say how it is to be built and configured. IoC works particularly well in containerised environments. We will take a look at Docker and Kubernetes in a future blog.

Continuous integration/deployment relies on having every element required to promote a change to production, infrastructure as code is an extension on this. All code should be in source control. If the code which describes how our infrastructure is built is packaged along with our software, we are able to achieve a repeatable deployment.

Once that production environment has been deployed, if there is ever a problem and we cannot recover from, we have a guaranteed source that can all be redeployed and ensure that it is the exact same as it was before the fault.

Once an environment has been deployed, it might be passed off to another team to monitor. When metrics are low they might make changes to that deployed resources, when this occurs configuration drift has happened. Our code which describes our environment is no longer reflective of that has been deployed. DevOps is about collaboration. We should not see configuration drift. The operations team should work with development. With any new development, we hope that is will be hugely successful. If our development starts trending and we experience latency issues, we have a problem. However, if all our code and infrastructure is in source control, we can spin up a duplicate of our service and load balance the requests.

Although not always required for DevOps, automation allows software teams to focus on development and reduce repeatable tasks.

Once an application is in production. We need to monitor the latency and usage to ensure that there are no delays in our service. An operational team will want to be able to monitor key metrics such as usage and latency.

DevOps is a development culture and way of thinking which is supported by process such as source control, continuous integration, continuous deployment and infrastructure as code. With these elements working together, developers are able to ship reliable code faster without the need to for lengthy deployments.