Troubleshooting Vault on Kubernetes

11min
|
Vault

Operating Vault in a Kubernetes environment brings a new set of challenges, operational patterns, and practices.

Troubleshooting Vault in Kubernetes resembles the methodology detailed in the Troubleshooting Vault tutorial. The primary differences concern the tools and techniques required to troubleshoot, which are specific to Kubernetes.

In this tutorial, you will use the command line in a terminal session to learn about fundamental techniques and tools for troubleshooting Vault in a Kubernetes deployment.

Checking cluster status

A good starting point for troubleshooting Vault is to gather information about the server status for each cluster member. This includes helpful information such as whether Vault is actively deployed, when it was deployed, whether it is in active or standby mode, and more.

Vault status

One way to immediately determine the Vault cluster status is to examine the output of vault status executed in each of the cluster member pods.

For example, in a Vault deployment using the Helm chart, you might find a list of pods like this example.

$ kubectl get pods
NAME                                  READY   STATUS    RESTARTS   AGE
vault-0                               0/1     Running   0          120m
vault-1                               0/1     Running   0          120m
vault-2                               0/1     Running   0          120m
vault-agent-injector-c54c5747-kmdpm   1/1     Running   2          121m

To get the status of a single Vault server, you can address the pod by name with kubectl. For example, the following command gets the status of Vault in the pod vault-0.

$ kubectl exec vault-0 -- vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            1
Threshold               1
Version                 1.6.1
Storage Type            raft
Cluster Name            vault-cluster-c6b0dbfc
Cluster ID              89afb352-28e7-b80c-7a67-ee5bf4896e84
HA Enabled              true
HA Cluster              https://vault-0.vault-internal:8201
HA Mode                 active
Raft Committed Index    1710
Raft Applied Index      1710

You can get the status of all Vault servers in a cluster by looping through each pod, and addressing them with the kubectl command by name based on the pod names vault-0, vault-1, and vault-2 from the previous examples.

$ for i in {0..2} ; do kubectl exec vault-$i -- vault status ; done
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            1
Threshold               1
Version                 1.6.1
Storage Type            raft
Cluster Name            vault-cluster-c6b0dbfc
Cluster ID              89afb352-28e7-b80c-7a67-ee5bf4896e84
HA Enabled              true
HA Cluster              https://vault-0.vault-internal:8201
HA Mode                 active
Raft Committed Index    1717
Raft Applied Index      1717
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            1
Threshold               1
Version                 1.6.1
Storage Type            raft
Cluster Name            vault-cluster-c6b0dbfc
Cluster ID              89afb352-28e7-b80c-7a67-ee5bf4896e84
HA Enabled              true
HA Cluster              https://vault-0.vault-internal:8201
HA Mode                 standby
Active Node Address     http://10.244.0.8:8200
Raft Committed Index    1717
Raft Applied Index      1717
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Total Shares            1
Threshold               1
Version                 1.6.1
Storage Type            raft
Cluster Name            vault-cluster-c6b0dbfc
Cluster ID              89afb352-28e7-b80c-7a67-ee5bf4896e84
HA Enabled              true
HA Cluster              https://vault-0.vault-internal:8201
HA Mode                 standby
Active Node Address     http://10.244.0.8:8200
Raft Committed Index    1717
Raft Applied Index      1717

An explanation of each field in the status output follows.

Seal Type: The type of seal in use. This value should match across cluster members.
Initialized: Whether the underlying storage has been initialized. This should always appear with a value of true in any case except that of a new and uninitialized server.
Sealed: Whether the server is in a sealed or unsealed state. A sealed server cannot participate in cluster membership or otherwise be used until it is unsealed. All members of a healthy cluster should report a value of false.
Total Shares: The number of key shares made from splitting the root key (previously known as master key); this value can only defined during initialization.
Threshold: The number of key shares required to compose the root key; this value can only defined during initialization.
Version: The version of Vault in use on the server.
Storage Type: The type of storage in use.
Cluster Name: The cluster name string; this value should match on all members of a healthy cluster.
Cluster ID: The cluster identification string; this value is dynamically generated by default, and should match on all members of a healthy cluster.
HA Enabled: Whether this cluster is using high availability (HA) coordination functionality.
HA Cluster: The cluster address used in client redirects.
HA Mode: The HA mode. Expected values are Active and Standby. There should be one active leader in every healthy cluster. In the example output, the pod vault-0 is the Active cluster leader.
Active Node Address: The address of the active HA cluster leader, used in request forwarding.
Raft Committed Index: The index value for storage items which are committed to the log. This value should closely follow or be equal to the value of Raft Applied Index in a healthy Vault cluster.
Raft Applied Index: The index value for storage items which are applied, but not yet committed to the log.

Helm chart status

If your Vault cluster is deployed with a Helm chart, you can begin a troubleshooting session by getting the basic deployment status from the helm status command.

$ helm status vault
NAME: vault
LAST DEPLOYED: Tue Jan 12 13:22:50 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!

Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:

https://www.vaultproject.io/docs/


Your release is named vault. To learn more about the release, try:

  $ helm status vault
  $ helm get manifest vault

You can learn several things from this output, including a few important details.

The deployment name is vault.
The last time the cluster was deployed; in this case, Tue Jan 12 13:22:50 2021.
The current deployment status; in this case, deployed.

Integrated Storage cluster member status

If your Vault cluster uses Integrated Storage (Raft), you can use the vault operator raft command to list the cluster member peers and immediately determine cluster leadership status.

Here is an example of the command vault operator raft list-peers and its output.

$ kubectl exec vault-0 -- vault operator raft list-peers
Node                                    Address                        State       Voter
----                                    -------                        -----       -----
9bc24771-64bb-2ac5-fee2-35a80f61e810    vault-0.vault-internal:8201    leader      true
313bb4e1-400b-c224-fffe-53061378570d    vault-1.vault-internal:8201    follower    true
ca918ffb-fe2c-a873-00bd-25f4d7bff140    vault-2.vault-internal:8201    follower    true

From this output, you can determine the cluster leader. In this example, it is the node with identifier 9bc24771-64bb-2ac5-fee2-35a80f61e810 or vault-0.

Server operational logs

Vault servers log information about operations to standard output and standard error. This information can then be captured by the systemd journal or routed to a static file depending on operational requirements.

When Vault is deployed in Kubernetes, you can retrieve the logs from pods with the kubectl logs command.

For example, here are the first 105 lines of a Vault server operational log, which shows important server startup information that can be useful for troubleshooting.

$ kubectl logs vault-0 | head -n 15
==> Vault server configuration:

             Api Address: http://10.244.0.13:8200
                     Cgo: disabled
         Cluster Address: https://vault-1.vault-internal:8201
              Go Version: go1.15.4
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
               Log Level: info
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.6.1
             Version Sha: 6d2db3f033e02e70202bef9ec896360062b88b03

==> Vault server started! Log data will stream in below:

To capture the entire log into the file vault-0.log, use a command like this example.

$ kubectl logs vault-0 > vault-0.log

Server audit device logs

In an ideal production environment, each Vault server should have one or more audit devices enabled to trace requests and responses.

If your troubleshooting scenario requires examining the details of requests and responses at this level, then you need access to these logs. In many production deployments, audit device logs are sent to an external system for processing.

If a file audit device is enabled in the Kubernetes pods however, you can access the file as a means to examine this information. For example, if the cluster was deployed with the Helm chart and enable auditing according to the Production Deployment Checklist, then the audit file is expected to be located in the pod at a path like /vault/audit/vault_audit.log.

You can confirm this by using the tail command to display the last line in the file.

$ kubectl exec vault-0 -- tail -n 1 /vault/audit/vault_audit.log | jq
{"time":"2021-01-22T19:15:08.3786012Z","type":"response","auth":{"client_token":"hmac-sha256:932f379268267d16043b4a00c4af2a80edbb0bffd714ab9dfadd2f9081f93243","accessor":"hmac-sha256:a97e09ab6c0250669de89efc7be88ce8304fbe0f6c58f01930f18f71807827db","display_name":"root","policies":["root"],"token_policies":["root"],"token_type":"service","token_issue_time":"2021-01-22T19:06:35Z"},"request":{"id":"cac13e04-9845-943b-f870-ce43270ed38d","operation":"create","mount_type":"kv","client_token":"hmac-sha256:932f379268267d16043b4a00c4af2a80edbb0bffd714ab9dfadd2f9081f93243","client_token_accessor":"hmac-sha256:a97e09ab6c0250669de89efc7be88ce8304fbe0f6c58f01930f18f71807827db","namespace":{"id":"root"},"path":"kv/data/5}","data":{"data":{"foo":"hmac-sha256:af182668c0f7326fe52fc6ef167c3d08ce1d680975554181cef6a0d0b3ef476f"},"options":{}},"remote_address":"127.0.0.1"},"response":{"mount_type":"kv","data":{"created_time":"hmac-sha256:e65a3b1d21f0b1957fb61c10bbddf01fe07ccc816e8f6c2eda71296c6ae36705","deletion_time":"hmac-sha256:8631c9a08fc95c910f6cf24a5e67b3ca590def860d6289aca808bc92e99c837d","destroyed":false,"version":1}}}

The file is present; use kubectl cp to copy it from the pod to the host like this.

$ kubectl cp default/vault-0:/vault/audit/vault_audit.log \
    vault-0_audit.log

You can now examine the vault_audit.log file as described in the Querying Audit Device Logs tutorial.

Gather debugging data with vault debug

Introduced in version 1.3, the debug command starts a process that monitors a Vault server, probing for information about it during a specified collection duration and generating a compressed archive of the resulting information.

You can use vault debug in a Kubernetes environment with a multi-step process that involves entering the pod, changing into a writable directory, and executing the command. You can then exit the pod, and copy the resulting debug archive from the pod to the host.

First, access the required pod; in this example, it is vault-0.

$ kubectl exec vault-0 --stdin=true --tty=true -- sh
/ $

Your system prompt is replaced with a new prompt / $ that includes the present working directory name. In the following examples, the prompt is shortened to only $.

Change into the /tmp directory so that the debug file can be successfully written.

$ cd /tmp
/tmp $

Now you can execute the debug command; the example command will execute for 2 minutes, taking a sample at the 1 minute and 2 minute marks, and then exit.

$ vault debug -interval=1m -duration=2m
==> Starting debug capture...
         Vault Address: http://127.0.0.1:8200
        Client Version: 1.6.1
              Duration: 2m0s
              Interval: 1m0s
      Metrics Interval: 10s
               Targets: config, host, metrics, pprof, replication-status, server-status
                Output: vault-debug-2021-01-22T14-51-12Z.tar.gz

==> Capturing static information...
2021-01-22T14:51:12.485Z [INFO]  capturing configuration state

==> Capturing dynamic information...
2021-01-22T14:51:12.489Z [INFO]  capturing server status: count=0
2021-01-22T14:51:12.490Z [INFO]  capturing host information: count=0
2021-01-22T14:51:12.489Z [INFO]  capturing metrics: count=0
2021-01-22T14:51:12.490Z [INFO]  capturing pprof data: count=0
2021-01-22T14:51:12.491Z [INFO]  capturing replication status: count=0

Note

The debug command honors the VAULT_TOKEN environment variable value and use it. This can have the unintended effect of limiting the gathered debugging information. If you need the maximum amount of information possible, use a highly privileged token as the value of VAULT_TOKEN before executing the debug command instead.

After two minutes elapse, the process exits and a file is now present in the /tmp directory.

Finished capturing information, bundling files...
Success! Bundle written to: vault-debug-2021-01-22T14-51-12Z.tar.gz

Note

The output is written to a gzipped tar file with a filename made up of the string vault-debug and a timestamp; in this example, the filename is vault-debug-2021-01-22T14-51-12Z.tar.gz.

Now you can exit the pod and copy the file from the pod to the host.

Exit the pod.

$ exit

Use the kubectl cp command to copy the debug output file from the vault-1 pod to the local host.

$ kubectl cp default/vault-0:/tmp/vault-debug-2021-01-22T14-51-12Z.tar.gz \
    vault-debug-2021-01-22T14-51-12Z.tar.gz

Now the file is present locally and you can use the tar command to extract its contents.

$ tar --extract --gunzip --file vault-debug-2021-01-22T14-51-12Z.tar.gz

Here is the tree output of the example extracted file's top-level directory.

$ tree vault-debug-2021-01-22T14-51-12Z
vault-debug-2021-01-22T14-51-12Z
├── 2021-01-22T14-51-12Z
│   ├── goroutine.prof
│   └── heap.prof
├── 2021-01-22T14-52-12Z
│   ├── goroutine.prof
│   └── heap.prof
├── 2021-01-22T14-53-12Z
│   ├── goroutine.prof
│   └── heap.prof
├── config.json
├── host_info.json
├── index.json
├── metrics.json
├── replication_status.json
└── server_status.json

3 directories, 12 files

The contents consist of Goroutine and heap profiling data, along with sanitized server configuration, host information, data index information, metrics, and status information.

You can learn more about the vault debug command and the data contained within the archives it generates from the debug documentation.

Next steps

This tutorial demonstrated the troubleshooting tools and techniques of a Vault cluster running on Kubernetes. You learned how to get server and cluster status, gather operational and audit device logs, and how to use the vault debug tool.

You can continue learning about Vault troubleshooting in the Troubleshooting Vault tutorial and might find related content such as Querying Audit Device Logs and Monitor Telemetry & Audit Device Log Data with Splunk helpful.

Resources

22 tutorials

Vault on Kubernetes
Vault secures, stores, and tightly controls access to passwords, certificates, and other secrets in modern computing. Here are a series of tutorials that are all about running Vault on Kubernetes.
- Vault