weird k8s
#author_luna #sysadmin-notes #kubernetes

why am i doing this #
because a friend nerdsniped me into doing it, next question
who #
random girl from the internet, do not take my words as truth, just want to share some of my experience at work out to the internet because I was burned out on the cloud hype back then (just like I was with docker, until everything died down and I could truly understand what it does), and nowadays I understand what the hype was about back couple years ago.
I will NOT teach anyone how to setup a kubernetes cluster from the ground up because I don't know how to do it, our cluster is managed (and I'll talk about that a bit).
this is more of a starting point article for people that want to hear me elaborate on how kubernetes works (I've gone on VR rants before, trust me), and I think the true amount of people is near zero, I don't care. many things here are provided as introductory points for further research by whoever is reading this, I can't just help you set everything up
making the case #
generally, kubernetes is an orchestrator for docker (well, technically OCI but I'll use both interchangeably) containers. you've probably already heard of docker so I won't go into that much detail on why you should dockerize your application. but the case for kubernetes (k8s) is that when you're in a docker-heavy environment you want some sort of tool that lets you declaratively configure the containers you want to run. there's generally 3 ways you can do this:
- manual (write your own
docker runinvocations) - docker-compose (write some yaml, let the tool spin all needed resources. recreates when needed), works for single-node
- service orchestrators (nomad, kubernetes), manages entire clusters
while I could call this ordered by "complexity" I don't think I can. the manual route is one that you should never take beyond tutorials of how docker works, so most people already start at using docker-compose (especially to spin up development environments) but you can use it as a sysadmin to manage infrastructure. I myself use it in my personal servers to manage my millions of side-projects.
at work though, a docker-compose-based system would be difficult to manage. there's specific requirements of our service that we need:
- to scale stateless services really quickly
- hardware to be abstracted away, a new server becomes abstracted resources. no specifics on what the server is or does. if it goes bad, axe it immediately
- zero-downtime deployments, even a 10 second downtime would cause users coming to our support hotlines
you can absolutely do these by yourself with any other sets of tooling (even without docker at all), programming languages, etc. kubernetes (in my view) standardizes (it's not really a standard) SOME of this with common language, just like docker/OCI "standardize" on shipping immutable root filesystems around the world thanks to linux's syscall interface being the only stable interface, but I digress.
at the end, kubernetes lets us do these things without us having to redo everything.
how is kubernetes sold #
kubernetes is usually offered by the big clouds, and some small cloud providers. in this case, the kubernetes "control plane" is managed by the cloud provider, you pay a fee to have that running + fees when you want new nodes to be added into the system. each node runs the kubelet (which talks with the control plane API) and the real container runtime that actually runs workloads (actual Docker Engine is not required! more info here)
you can deploy all kubernetes components yourself on your own hardware (see k0s, talos, and I'm sure there's 30 others), including spinning up your own cloud VMs somewhere and installing/configuring everything manually. if you're interested, Kubernetes The Hard Way may help, but know that doing a full manual setup is not recommended (there's a lot of ways that can go wrong)
clouds, by integrating managed kubernetes offerings into their own ecosystem lets things be more tightly integrated, so you can just "spin up a new cluster" from your UI, assign some hardware that would be available to run your stuff in minutes and let someone else worry about the cloud bill that'll come next month. keep in mind that you are adding quite a complex piece of software to manage a lot of hardware, it can and will go wrong
how is it accessed #
it generally depends on the environment! since kubernetes is a system that lives between various workload creators (developers) and real hardware, companies may lock down access to it to prevent people from causing too much havoc, just like ssh access would be locked down. each way to interface with kubernetes will be highly depenedent on company culture, so we'll assume that there are no company cultures on explaining how it all ticks.
but assuming you have a running cluster you can use the kubectl CLI to access it, it serves as an API client that lets you inspect the cluster, edit deployments, delete them, etc. there are graphical clients which may be more or less approachble, depending of who you are but I won't recommend anyone in specific, my employer already pays for one and they should sponsor me first.
the kubectl CLI is configured via a ~/.kube/config file, that contains your API credentials, addresses of the API server, certificates of the API server (I believe k8s API is always HTTPS?), etc.
techincal intro #
at the core of the kubernetes mental model lies a Pod, from the docs:
Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
it's an atomic unit (a Pod can't be split in half across two separate nodes) describing a group of actual real containers inside, as well as what they need in terms of resources (for scheduling), what ports do they expose (for networking), what storage volumes do they need (you should already be familiar with docker's immutability and how the container rootfs is ephemeral, and how docker has different ways to attach persistence to containers. k8s has the same idea).
in general, you will get Pods managed by either of these two:
- ReplicaSet
- its whole job is maintaining a set amount of replicas of a Pod across the cluster
- assumes each Pod is completely isolated (in terms of state) so it can scale up/down accordingly to the amount of replicas you've set on it
- you can create ReplicaSets yourself, but in general you want a Deployment, since that then takes ownership of multiple ReplicaSets which provides useful things like rollbacks.
- StatefulSet
- for when you need stable identities, stable scheduling, consistent updates (e.g in order)
- useful for databases, but it's all just a Pod at the end, it's more
- (there's others, but I won't go into them lol)
for a practical use-case, let's think we want some 10 replica pods of debian:latest, you could do something like this:
apiVersion: apps/v1 kind: Deployment metadata: name: awesome-debian spec: replicas: 10 template: spec: containers: - name: debian image: debian:latest command: ["sleep", "infinity"]
you would save that to a file, debian.yaml and then kubectl apply -f debian.yaml. kubectl contacts the control plane submitting that Deployment and kubernetes will create that object, which then creates its ReplicaSet (set to 10 replicas), which then will create 10 Pods, which will be scheduled across your cluster (if you have only one node, they'll all be on the same node unless you specify differently, and you can!).
since we're using a Deployment and not a StatefulSet, all Pod names will be randomly generated. you can execute commands as if you were sshing into a Pod via kubectl exec -it awesome-debian-abcdef-561hfj bash.
requesting cpu and ram #
each container in a Pod can request specific cpu and memory. kubernetes will use that information to find an available node with those resources to schedule the Pod in (you can have custom resources but I won't go into detail there). the Pod's combined resource requests will be used to find a node that has the needed resources, so either all of the resource requests are available in a node in the cluster or kubernetes will fail to schedule the Pod. so for example if your Pod requests 3 cpu but all the nodes in your cluster are 2 cpu, the Pod will never schedule.
while memory has a straightforward definition, cpu has something more specific, from the docs:
In Kubernetes, 1 CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine.
Fractional requests are allowed. When you define a container with
spec.containers[].resources.requests.cpuset to 0.5, you are requesting half as much CPU time compared to if you asked for 1.0 CPU.
CPU resource is always specified as an absolute amount of resource, never as a relative amount. For example,
500mCPU represents the roughly same amount of computing power whether that container runs on a single-core, dual-core, or 48-core machine.
important point here: if your nodes are of different CPUs (for example, some are AMD EPYC Genoas vs older generation Intel Xeons), 1 cpu will mean very different things (cpu capabilities on AVX, single-thread performance, etc). in this case, you can use label selectors to ensure that your Pods schedule to specific types of nodes (would only work if those nodes have a label for their cpu type) which would then "guarantee" you that 1 cpu means 1 Genoa or Xeon cpu in the cluster.
for Pod resources there are two types, "request" and "limit":
- resource requests are required resources needed for the Pod to properly function. kubernetes will not let the Pod schedule anywhere that doesn't have the necessary resources specified in
requests - resource limits are optionally defined resources.
- if a container uses more
cputhan the specified on limits, it will throttle. - if a container uses more
memory, kubernetes will emit an OOMKilled event,SIGTERMand then restart the container.
- if a container uses more
if requests > limits, kubernetes will fail to schedule the Pod.
if requests == limits, kubernetes gives the Pod a Guaranteed Quality of Service, which means the Pod is less likely to be evicted by the kubernetes scheduler.
if requests < limits, kubernetes gives the Pod a BestEffort quality of service, and so they're the first to go if the node comes under resource pressure. but would let the Pod be scheduled in more nodes (say, a Pod has 1 cpu in request, but 8 in limit in case there isn't enough use of 7 cores across the other Pods in a node).
if requests are defined, but no limits, kubernetes assigns Burstable quality of service, which lets the Pod containers use all of the node resources.
be aware that a Pod consuming more cpu than it should (e.g no limits) may make everything else unresponsive (including critical node services like the kubelet).
beyond cpu and ram #
storage and networking are somewhat dependent on who manages your cluster. kubernetes abstracts cpu and ram into "cpu and ram" (what speed? what cpu? that's left to something else) but storage and networking are things that at most have an interface for you to interact with them, kubenertes does not manage or schedule anything there (it can't setup a switch for you, for example, but you will need to make your kubernetes aware of the network topology).
Pods can request storage via a specific object: the PersistentVolumeClaim (PVC). it's backed by a PersistentVolume which then has the specifications for how your cluster should provision storage from something else. that "something else" can be nfs, cephfs, and many other things. anything that follows the interfaces can fit here. not all storage providers provide POSIX compliance, and it's on you (as the one writing a PVC yaml) to make sure a PVC can be mounted by multiple Pods or not
while Pods by default have an (ipv4 is most common, there may be ipv6 k8s deployments in the wild) pod IP, that's usually set (or recommended to be set) into an internal ipv4 range that is not accessible from the internet (e.g 10.0.0.0/8), then each node gets a configuration to know which ranges can it allocate IPs to (and k8s at the end is the one that "assigns" an ip to a Pod on its startup, with the container runtime mapping the ip to actual networking inside the Pod)
however that's just a local range. if you want to expose things to the internet, you will need to use separate kubernetes objects. the two main ways are:
- Service
- you specify which Pods are matched by this service (via labels) so that it routes to them (it does have a builtin load balancer)
- this works purely at OSI Layer 4 (TCP/UDP)
- you can specify Services to be backed by a
ClusterIP(which then lets you just use that stable IP to contact the real Pod) but to go over the internet you want an Ingress- technically you can use a Service of type set to LoadBalancer to then get an external ipv4 address that routes at TCP/UDP, but that's cluster-specific.
- Ingress
- a way to manage an HTTP reverse proxy (nginx, caddy, traefik)
- you install an Ingress Controller for the reverse proxy of your choice
- you CAN (depending on Ingress implementation) request TLS to be provisioned via letsencrypt here
k8s state is not immutable #
inside the control plane there is a database. when you run kubectl apply on something you change the thing. if the thing itself doesn't keep some kind of edit log on what it does (like a Deployment which keeps track of its ReplicaSets, new and old) then the previous state of it is gone. you can't rollback a change to a Service/Ingress/etc via what kubernetes provides. kubernetes is a bunch of yaml and it attempts to use its resources (hardware, ip addresses, storage providers) to make a cluster that aligns with the yamls it has.
this may be fine for you and your organization's needs! but if you want to get proper edit logs for everything in the cluster you go down to something backed by a git repo somewhere. what I use at work is argocd but it comes with its own set of intricacies and gotchas, there may be other tools that I don't know about too.
yaml hell #
I've mentioned "installing" before. you can definitely write your own yamls to deploy some random postgres or redis instance, but sometimes you may need more components or want to reuse yamls from someone else. k8s by default will not provide you with anything for that kind of usecase, so I'd say most of the ecosystem has centralized around helm for it. helm at its core is a yaml templating engine which you configure, which then returns final yaml files that then are applied to k8s
you install the helm cli, it keeps some metadata in your cluster about "application versions" and "releases" (so you could kind of rollback to previous versions of your configuration if need be, kind of like how Deployment does it by default). a "package" here is interchangeable with a "helm chart", and in turn there are many charts available for postgres, redis, etc.
secret management #
there's two ways k8s can configure your application:
- ConfigMap objects (which can either be env vars or actual file injection)
- Secret objects (same thing, but they're encoded into base64 in the cluster storage, decoded back into plaintext so they can be added to the Pod)
of important note, Secrets live in effective plaintext, it kind of has to be stored in plaintext so that your application code can read them. you can create your own solutions here to encrypt Secrets in a way that your application then decrypts it. but my focus remains on the basics.