Kubernetes FAQ

Recently I’ve been working a lot with kubernetes at my job. I’ve written up a bunch of internal documentation based on my work, and because it’s not company-specific I decided I’m going to share it here for anyone trying to find answers on the internet.

Table of Contents:

  1. what is k8s?
  2. view nodes in a nodegroup
  1. view pods on a node
  2. restart deployment
  3. insufficient cpu, insufficient memory errors
  4. deployment is not ready
  5. CrashLoopBackoff
  6. pod didn’t trigger scale-up
  7. how do I avoid accidentally running a kubectl command on the wrong cluster?

What is Kubernetes? How do I use it?

Check out kubernetes docs here: https://kubernetes.io/docs/home

View Nodes in a Nodegroup

Confirm the nodegroup you’re checking actually exists (e.g. you’re looking for pods running under engtools-infra/my-app) by listing all the nodegroups in a certain namespace:

k get nodegroup -n engtools-infra

Get the nodes under that nodegroup:

k get nodes -o wide | grep my-app

View Pods on a Node

View pods under a node with the following command (replace NODE_NAME_GOES_HERE with the node’s name, e.g. ip-10-10-12-123.ec2.internal).

k get pod --field-selector=spec.nodeName=NODE_NAME_GOES_HERE -owide --all-namespaces | grep -v -E "( kube-proxy-| kube2iam-|  local-volume-provisioner-| node-monitoring-| localusers-)"

Restart Deployment

Make sure you’re in the right cluster, get the deployment name, then use the rollout command.

k config current-context
k get deployment -n [NAMESPACE]
k rollout restart deployment [DEPLOYMENT_NAME] -n [NAMESPACE]

insufficient cpu, insufficient memory errors

If you’re seeing errors about insufficient resources, or node(s) had taint … that the pod didn’t tolerate warnings, you probably need to define instanceType and resources attributes in your service k8s objects / charts. For example, something like the following:

  instanceType: c5.2xlarge
  resources:
    requests:
      cpu: "3"
      memory: "4G"
    limits:
      cpu: "4"
      memory: "6G"

Deployment is not ready

A log you might see when deploying with Bazel is Deployment is not ready: <service>. 0 out of 2 expected pods are ready. This usually means k8s pods are in a CLB or some other non-ready status.

Use kubectl commands to check on the pods’ status and logs. Usual status check command:

kubectl -n NAMESPACE get pods

CrashLoopBackoff

If you see CrashLoopBackOff status on one of your pods, that means it’s crashing (I know, thanks captain obvious).

Check the pod logs: k logs CRASHING_POD -n NAMESPACE.

Find the deployment.yaml file describing the launch instructions for this pod. Find the command attribute in that file, comment out the value underneath it, and add the following:

          - tail
          - "-f"
          - /dev/null

Then re-deploy the pod. This will allow you to SSH into the pod, run the failing command manually, and generally get more info about the problem.

You might also need to comment out the liveness/readiness container specs in deployment.yaml, as those will often cause the pod to crash as well if there’s an issue with the primary container.

More info about troubleshooting crashloopbackoff pods can be found in this releasehub.com article.

pod didn’t trigger scale-up

Normal   NotTriggerScaleUp    19m                  cluster-autoscaler   pod didn’t trigger scale-up: 3 node(s) had taint {node: database-rev-eng-8c38-pool1}, that the pod didn’t tolerate…

Potentially a node group or a scaling group issue. For example, you might have a single AutoScalingGroup (asg) under your nodegroup-controller which has reached its max size. In this case, adding autoscaling: true option to your charts may be required in order to have more ASGs created.

If adding nodegroup options, you may need to delete the existing nodegroups (krm ng NODEGROUP_NAME -n NAMESPACE) for the option to take effect. After creating the new nodegroup, the cluster autoscaler may need to pass by, which can take a few minutes. So if you’re still seeing the scale-up messages and your pod in a Pending status after such a change you may need to give it a little time.

How do I avoid accidentally running a kubectl command on the wrong cluster?

Use Explicit Context in Commands

kubectl –context cluster3.us1.test.com get pods -n mynamespace  

Kube-ps1

kube-ps1 will show you what cluster and namespace you’ve set in your prompt, which helps avoid this issue:

After installing it with homebrew, add this to your .rc file and restart your terminal to enable it:

source /usr/local/opt/kube-ps1/share/kube-ps1.sh
PROMPT='$(kube_ps1)'$PROMPT

It works great in conjunction with using kubectx and kubens to change your context/cluster/namespace.

It’s very easy to change contexts in a different terminal and have a stale prompt showing the old context. Press enter to refresh the prompt.

Powerlevel10k

If you use powerlevel10k, it can dynamically update your prompt with the current context when you’re typing a kubectl command – https://github.com/romkatv/powerlevel10k#show-on-command.

Troubleshooting

‘Clear Formatting’ Shortcut Fix

Recently my ‘clear formatting’ shortcut for most google apps (Command (or Cmd) ? + \) stopped working on my Macbook.

This was very frustrating as I use it all the time.

Apparently the 1password took over this shortcut since it uses cmd+\ to fill logins, silently. I was able to change the shortcut in 1password preferences, at which point clear formatting started working again.

In the future, if you have a similar issue, there’s also an app called ShortcutDetective that can help you find out if a keyboard shortcut is being used for an app on your Mac.