How To Maintain and Rotate Keys and Tokens With Zero Downtime

Introduction

Secrets are a form of distilled trust. They may be API keys, passwords, certificates, and other forms of key material. Generally, such credentials have a few major properties that make them more useful than just granting trust:

  • They can be asymmetric. Perhaps the most important and secure way to perform key exchange (Bruce Schneier’s seminal Applied Cryptography is the best place for the details)
  • They can be limited in scope (restricting access, capturing the persona or actor behind the action)
  • They can be limited in time
  • They can be partial, meaning you’d need a bunch of other keys to perform actual access (think: the atomic launch button in movies that requires several different physical keys)

Key Rotation as a Practice

Traditionally, key rotation is performed on a few occasions:

  • A regular policy: keys should not live forever. Expiring keys and rotating them keeps a healthy security posture
  • System upgrade and security hardening: sometimes a change in the authentication infrastructure—moving to better or create faster protocols and algorithms requires rotating keys
  • Key leaks, data breaches, and suspected leaks: in any of these unfortunate events, you want to be rotating your keys. Even if you suspect a leak has happened, it’s still a good practice to just go ahead and rotate

The practice of key rotation—at large, for a full system, across various types of keys, services, and vendors—is a very delicate form of orchestrating a sensitive process, protocols, and mitigation of lack of capabilities of various vendors / APIs.

What You Should Know

Dangers in Leaking Keys

Although it’s pretty obvious, we’re going to reiterate the dangers in leaking keys and sensitive access information, because it’s worth repeating (especially if you’re running a business that handles sensitive customer data):

  • Disrupting business and regular operation
  • Loss of IP (codebase, competitive advantage)
  • Direct financial losses (e.g. system abuse, ransom)
  • Financial loss via customer compensation and losing business and reputation (or a drop in share prices)
  • Regulatory fines (GDPR)
  • Litigation (paying law firms)
  • Forensic costs
  • Remediation costs

Best Practices

Aside from adhering to a standard security regulation (NIST, SoC, or others) where the proper controls are spelled out, the gist of maintaining a good security posture with your keys and credentials can be summarized to these questions:

  • Do you know where all of your keys are stored?
  • Are you generally using keys for too long? Do you have a way to measure that?
  • Can you tell where a key has been used?
  • How do you make sure a key has really been rotated?
  • After rotation, how do you make sure an old key is really revoked and disabled?
  • Are you sure key access and key rotation are enabled only for authorized personnel and processes?

Zero Downtime Key Rotation

Maintaining 2 Valid Keys While the Target Service Supports Only One Active

If the target service in charge of validating the token only understands that at any given time 1 token is active (for example, Auth0), then how do you maintain two valid keys? Once you create a new token the old one is immediately revoked. In this case, you need to do the heavy lifting on your own:

  1. Build a shim over your client that makes sure your client stores two keys. One will be a new key, and the other will be the old, revoked, key.
  2. For every service access, you now need to try the new key, and switch over to the old key if it fails—this means the new key is not active yet.
  3. Keep tabs on the number of failures systemwide. Log the failures and once there is no failure— that means the new key is active all over.

Maintaining 1 Valid Key While the Target Service Supports Two Active

When the target service supports more than one active key (for example Github), you have less work to do. You need to create the new key in the way you’re used to, swap over the new key in all services, and run the system.

In here, too, there’s a value in logging errors, and logging the fact that “all services are using the new key” in a way you prefer. Once that’s done you can revoke the old key.

Automating Key Rotation

When you have a slick zero-downtime process in place, whichever you eventually chose and works, you now have the extra bonus of being able to automate it. In order to automate you need to be able to perform the following via API only:

  1. Create a new key (or revoke for the 2-key in client method)
  2. Check if a key is valid

Once you realize that you have that, you can create the workflow as you see fit with either technique for zero downtime.

This UrIoTNews article is syndicated fromDzone