Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

What is Kubernetes?

What is Kubernetes?

"Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation."

Okay great, but what does that actually mean?

In more general terms Kubernetes is a container or microservice platform that allows orchestration of an applications infrastructure without restrictions on the type of application being deployed onto it; following the rule of thumb that if an application will fit onto a container, Kubernetes can deploy it.

Better, but what’s a container platform?

Container platforms are the latest in an evolution of application deployment methods. Looking backwards, most data professionals will be at least aware of traditional application deployments which were done on physical servers. Physical server deployments sadly lacked the ability to define resource boundaries which often resulted in resource allocation issues.
Since the server was physical, the space available was finite and when running multiple applications or multiple instances of a single application it’s easy to see how a single instance could monopolise the available resource leaving any other instances shorthanded which in turn would affect performance.

Over time virtualised deployments were introduced to address these limitations. This allows for significantly better resource utilisation than on traditional physical servers, as multiple virtual machines (VM’s) can be run on a single server’s CPU. And thanks to isolation between VMs, information on one application is not easily accessible to any other.

And then there are container deployments; containers are similar to VMs but are able to share the operating system across applications making them much more lightweight due to less restrictive isolation.

This, and a slew of other benefits, have made container based platforms increasingly popular, Kubernetes among them.

Containers are by design ephemeral; if one were to be corrupted or fail, user data on the application is not lost as it is not stored on the container itself. This design also allows for massive horizontal scalability; a server cluster is able to dynamically create new instances of an application in line with demand. And whilst this last point could easily be done by most sysadmins with a script - and it could - Kubernetes automates the process.

 

So how does it work?

A Kubernetes implementation is construed of smaller components all of which work in compliment to allow applications to be deployed without restriction on their type or design.

  • Image

    An image is a stored instance of a container that holds a set of software needed to run an application, this is a method of packaging software that allows it to be stored in a container registry, pulled to a local system or run as an application. The image will include meta data to indicate which executables to run, who built it and other information.

  • Container

    A container is a lightweight and portable executable image containing all the required software and dependencies required for an application to function. Containers decouple applications from underlying host infrastructure to make deployment easier in different cloud or OS environments, and for easier scaling.

  • Pod

    A pod is a set of one or more containers with shared storage, IP address and port space which network on a Kubernetes configuration, and are able to communicate with one another over localhost networking.

  • Cluster

    A cluster is a set of compute machines called nodes which run a containerised application. Every cluster will have and a control plane for maintaining the state of the cluster and at least one worker node to run the workloads.

This gives us an hierarchical structure to deploy our apps onto; an image is used to define the software and dependencies on a container, a container which executes the image, a pod being a set of containers which share storage, IP address and port space and a cluster being the set of machines on which the container(s) can be run. 

Main benefits of Kubernetes

Immutable Infrastructure

Once a server is deployed it is not modified, rather changes or updates are made to the base image which is then used to replace the existing server.

Horizontal Scaling

Kubernetes is able to scale horizontally by dynamically increasing the number of available pods in response to changes in the workload demand on the cluster.

Self Healing

When containers fail or become damaged, Kubernetes is able to restart, replace and kill them as necessary, reducing the need for manual monitoring and intervention to maintain the health of an application.

Load Balancing

Where network traffic to a container is high, Kubernetes is able to load balance to redistribute the traffic to retain stability of the deployment.

Unopinionated

Kubernetes is not opinionated on how logging is handled; an app can log to std-out so that they be collected in whichever manner is most appropriate to the project. It is also unopinionated on the config language employed. The combination of these two facets allow apps to be built largely as desired with the ability to collect or expose information as required.

 

Drawbacks of Kubernetes

Whilst there are many obvious advantages to using a platform like Kubernetes within your deployment, there are drawbacks that should be taken into consideration before utilising a new tool.

  • It is not a CI/CD workflow; it doesn’t deploy code or build the application itself

  • Whilst it supports the addition of middleware, database, caches and other components to an application it does not provide these and so they must be intentionally implemented in any design and build phases

What’s next?

Want to learn more about implementing an MLOps solution, productionising models and how Kubernetes can help you do that?  Get in contact with our Data Science Team. 

Torr WylderComment